Skip to content

Commit

Permalink
Merge pull request idaholab#16272 from crswong888/step07-15232
Browse files Browse the repository at this point in the history
Tutorial 1: Step 7
  • Loading branch information
GiudGiud authored Nov 24, 2020
2 parents 455bf9e + 09de5f4 commit db57ca9
Show file tree
Hide file tree
Showing 10 changed files with 164 additions and 11 deletions.
2 changes: 1 addition & 1 deletion framework/doc/acronyms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ JSON: JavaScript Object Notation
LGPL: GNU Lesser General Public License
MMS: Method of Manufactured Solutions
MWR: Method of Mean Weighted Residuals
MPI: Method Passing Interface
MPI: Message Passing Interface
MOOSE: Multiphysics Object Oriented Simulation Environment
NE: Nuclear Energy
NQA-1: Nuclear Quality Assurance Level 1
Expand Down
1 change: 1 addition & 0 deletions framework/doc/globals.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
libMesh: http://libmesh.github.io/
PETSc: https://www.mcs.anl.gov/petsc/
MOOSE: http://www.mooseframework.org
YAML: http://yaml.org/
python: https://www.python.org/
Expand Down
2 changes: 1 addition & 1 deletion large_media
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,6 @@ Ran 1 tests in 0.3 seconds.

Later in this tutorial, the testing system will be explored in greater detail and tests will be created for the Babbler application.

*For more information about the MOOSE testing system, please visit the [application_development/test_system.md] page.*

## Enable Use of GitHub id=git

[Git](https://git-scm.com) is a version control system that enables teams of software developers to manage contributions to a single code base. When using Git, a `commit` is an update to the repository that marks a checkpoint to be revisited even after further changes are made. A repository's *commit log* shows the history of commits, and helps track the progression of code. A `push` uploads the local version of the repository to the remote (online) one.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Step 2: Creating an Input File
# Step 2: Write an Input File

In this step, the concept of an input file is introduced. These files provide the means for controlling [!ac](FE) simulations with MOOSE. To demonstrate this concept, a steady-state diffusion of pressure from one end of the pipe, between the pressure vessels, to the other (see the [tutorial01_app_development/problem_statement.md] page) will be considered. The goal, here, is to create an input file that solves this simple [!ac](BVP). This problem is detailed in the [#demo] section, but, first, some basic information regarding input files and their execution are provided. As for many steps of this tutorial, concepts will be introduced and a hands-on demonstration will follow.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Step 4: Generating a Weak Form
# Step 4: Generate a Weak Form

The first question to ask when presented with a [!ac](PDE) that governs a problem's physics is: "How do I solve this equation?" The MOOSE answer to this question is to use [Galerkin's Method](#galerkin), which involves expressing the *strong form* of a governing [!ac](PDE) in its *weak form*.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Step 5: Creating a Kernel Object
# Step 5: Develop a Kernel Object

In this step, the basic components of [#kernels] will be presented. To demonstrate their use, a new `Kernel` will be created to solve Darcy's Pressure equation, whose weak form was derived in the [previous step](tutorial01_app_development/step04_weak_form.md#demo). The concept of class *inheritance* shall also be demonstrated, as the object to solve Darcy's equation will inherit from the `ADKernel` class.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,154 @@
# Step 7: Execute in Parallel

!alert construction
The remainder of this tutorial is currently being developed. More content should be available soon. For now, refer back to the [examples_and_tutorials/index.md] page for other helpful training materials or check out the MOOSE [application_development/index.md] pages for more information.
A major objective of MOOSE is performance. This step briefly introduces parallel processing and the basic commands used for running an applicationin parallel are demonstrated. A few basic tips on how to evaluate and improve performance are given.

## MOOSE Multiprocessing

There are two types of parallelism supported by MOOSE: [multiprocessing](https://en.wikipedia.org/wiki/Multiprocessing) and [multithreading](https://en.wikipedia.org/wiki/Thread_(computing%29). At its core, MOOSE is designed to run in parallel by using the [Message Passing Interface](https://en.wikipedia.org/wiki/Message_Passing_Interface) protocol. [!ac](MPI) is a library of programming tools for accessing hardware and controlling how multiple CPUs exchange information while working simultaneously to run a single computer program. Shared memory parallelism is also supported through various threading libraries and can be used in union with [!ac](MPI).

The general approach to solving a [!ac](FE) simulation in parallel is to partition the mesh and run an individual process that assembles and solves the system of equations for each of those mesh partitions. In general, the duration of solve decreases as the number of CPUs increases.

### Basic Commands id=commands

The `mpiexec` command is used to execute a MOOSE-based application using [!ac](MPI). For example,
the tutorial application can be executed as follows, where the `-n 4` is an argument supplied to
the `mpiexec` command that indicates to use 4 processors for execution.

```bash
cd ~/projects/babbler
mpiexec -n 4 ./babbler-opt -i test/tests/kernels/simple_diffusion/simple_diffusion.i
```

For most cases using [!ac](MPI) alone is the best coarse of action. If threading is desired it may
be enabled using the `--n-threads` option, which is supplied directly to the application executable.
For example, the following runs the babbler application with 4 threads.

```bash
cd ~/projects/babbler
./babbler-opt -i test/tests/kernels/simple_diffusion/simple_diffusion.i --n-threads 4
```

As mentioned, it is possible to run using both [!ac](MPI) and threading. This is accomplished
by combining the two methods described above.

!alert tip title=Optimum numbers are hardware and problem dependent
The number of processors and threads available for execution is hardware dependent. A modern laptop
typically has 4 processors, with 2 threads each. In general, it is recommended to begin with
using just [!ac](MPI). Thus, it is typical to use between 4 and 8 processors for the `mpiexec`
command. If threading is added then using 4 processors for [!ac](MPI) and 2 for threading would be
typical. The optimum arrangement for parallel execution will be hardware and problem dependent, it
may be worth while exploring differing arrangements before running a full-scale problem.

!alert note title=Parallel Execution in Peacock
In the "Execute" tab of Peacock, the `mpiexec` and `--n-threads` options can be used by selecting the "Use MPI" and "Use Threads" checkboxes and specifying the command syntax. These options can be set and enabled by default in the PEACOCK preferences.

*For more information about command-line options, please visit the [application_usage/command_line_usage.md] page.*

### Model Setup id=model-setup

The [Mesh] System in MOOSE provides several strategies for configuring a [!ac](FE) model to be solved in parallel. Most end-users won't have to alter the default settings. Even application developers need not worry about writing parallel code, since this is handled by the core systems of MOOSE, [libMesh], and [PETSc]. However, advanced users are likely to encounter situations in which the default parallelization techniques are not suitable for the problem they are solving. Such situations are beyond the scope of this tutorial and interested readers may refer to the following for more information:

- [syntax/Mesh/Partitioner/index.md]
- [syntax/Mesh/index.md#replicated-and-distributed-mesh]
- [syntax/Mesh/splitting.md]
- [source/partitioner/PetscExternalPartitioner.md]


### Evaluating and Enhancing Performance

MOOSE includes a tool for evaluating performance: [PerfGraphOutput.md]. This enables a report to be printed to the terminal that details the amount of time spent processing different parts of the program as well as the total execution time. By evaluating performance reports, the ideal [model setup](model-setup) and [parallel type](#commands) can be found. This feature can be enabled in an input file like as follows for from the command-line using `--timing`.

```
[Outputs]
perf_graph = true
[]
```

There is an entire field of science about [!ac](HPC) and massively parallel processing. Although it is a valuable one, a formal discussion cannot be made here. One concept worth mentioning is [scalable parallelism](https://en.wikipedia.org/wiki/Scalable_parallelism), which refers to software that performs at the same level for larger problems that use more processes as it does for smaller problems that use fewer processes. In MOOSE, selecting a number of processes based on the number of [!ac](DOFs) in the system is a simple way to try and achieve scalability.

!alert tip title=Try to target 20,000 [!ac](DOFs)-per-process
MOOSE developers tend to agree that 20,000 is the ideal number of [!ac](DOFs) that a single process may be responsible for. This value is reported as "`Num Local DOFs`" in the terminal printout at the beginning of every execution.

*For more information about application performance, please visit the [application_development/performance_benchmarking.md] page.*

## Demonstration

To demonstrate the importance of parallel execution the current Darcy pressure input file will be
utilized but two additional command-line options should be applied. First, the performance
information shall be included using the `--timing` option and second the mesh will be uniformly
refined 4 times to make the problem large enough for analysis.

```bash
cd ~/projects/babbler/problems
./babbler-opt -i pressure_diffusion.i -r 4 --timing
```

!alert warning title=Use less refinement for older hardware
Running this problem with 4 levels of refinement may be too much for older systems. It is still
possible to follow along with this example using less levels of refinement.

The `-r 4` option will split each quadrilateral element into 4 elements, 4 times. Therefore the
resulting mesh will be 4^4^ times larger. The original input file results in 1000 elements, thus
the version executed with this command contains 256,000 elements. This change is evident in the
mesh section of the terminal output. In addition, the number of [!ac](DOFs) is reported, which is
the number import to consider when selecting the number of processors.

```
Nonlinear System:
AD size required: 4
Num DOFs: 257761
Num Local DOFs: 257761
Num Partitions: 1
```

The number to consider is the number of local [!ac](DOFs), which is the number of [!ac](DOFs) on
the root processor and is roughly equivalent to the number on the other processors. In addition
the performance information should also be presented at the end of the simulation.


```bash
Performance Graph:
--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Section | Calls | Self(s) | Avg(s) | % | Children(s) | Avg(s) | % | Total(s) | Avg(s) | % |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
| BabblerTestApp (main) | 1 | 0.006 | 0.006 | 0.04 | 15.048 | 15.048 | 99.96 | 15.054 | 15.054 | 100.00 |
| FEProblem::outputStep | 2 | 0.001 | 0.000 | 0.00 | 0.708 | 0.354 | 4.70 | 0.708 | 0.354 | 4.71 |
| Steady::PicardSolve | 1 | 0.000 | 0.000 | 0.00 | 7.463 | 7.463 | 49.57 | 7.463 | 7.463 | 49.57 |
| FEProblem::solve | 1 | 1.111 | 1.111 | 7.38 | 6.351 | 6.351 | 42.19 | 7.462 | 7.462 | 49.57 |
| FEProblem::computeResidualInternal | 4 | 0.000 | 0.000 | 0.00 | 1.753 | 0.438 | 11.64 | 1.753 | 0.438 | 11.64 |
| FEProblem::computeJacobianInternal | 2 | 0.000 | 0.000 | 0.00 | 4.598 | 2.299 | 30.54 | 4.598 | 2.299 | 30.54 |
| FEProblem::outputStep | 1 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.00 |
| Steady::final | 1 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.00 |
| FEProblem::outputStep | 1 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.00 | 0.000 | 0.000 | 0.00 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
```

The report indicates that the total duration of the execution was approximately 15 seconds (obviously
this will vary depending on hardware) and the solve time to be approximately 7.5 seconds.

To test the parallel scaling of this [!ac](FE) model it can be executed with an increasing number
of processors. For example, the following executes the same problem with two processors. If the
problem is scalable then the +solve time+ should be expected to be twice as fast.

```bash
cd ~/projects/babbler/problems
mpiexec -n 2 ./babbler-opt -i pressure_diffusion.i -r 4 --timing
```

The data presented in [scale] shows decreasing solve time as the number of processors increases.
This problem was executed on a 2019 Mac Pro with a 2.5 GHz 28-Core Intel Xeon W. For perfect
scaling the 8-core run should be 8 times faster than the serial execution. Of course perfect
scaling is not possible due the necessity of performing parallel communication during the solve.

!table id=scale caption=Problem solve time with increasing numbers of processors.
| Num. Processors | Local [!ac](DOFs) | Solve Time (sec.) |
| - | - | - |
| 1 | 257,761 | 7.5 |
| 2 | 128,968 | 4.0 |
| 4 | 64,575 | 2.1 |
| 8 | 32,382 | 1.2 |

In practice, a single process is sufficient for any MOOSE [!ac](FE) problem that has less than 20,000 total [!ac](DOFs).

!content pagination previous=tutorial01_app_development/step06_input_params.md
next=tutorial01_app_development/step08_test_harness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Step 8: Write a Regression Test

!alert construction
The remainder of this tutorial is currently being developed. More content should be available soon. For now, refer back to the [examples_and_tutorials/index.md] page for other helpful training materials or check out the MOOSE [application_development/index.md] pages for more information.

!content pagination previous=tutorial01_app_development/step07_parallel.md
4 changes: 2 additions & 2 deletions python/MooseDocs/extensions/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -351,9 +351,9 @@ def createToken(self, parent, info, page):

# Sub/super script must have word before the rest cannot
if (tok == '^') or (tok == '@'):
if not parent.children or (not parent.children[-1].name == 'Word'):
if not parent.children or (parent.children[-1].name not in ('Word', 'Number')):
return None
elif parent.children and (parent.children[-1].name == 'Word'):
elif parent.children and (parent.children[-1].name in ('Word', 'Number')):
return None

if tok == '@':
Expand Down

0 comments on commit db57ca9

Please sign in to comment.