-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve multithread support #188
Conversation
Codecov Report
@@ Coverage Diff @@
## master #188 +/- ##
==========================================
- Coverage 88.63% 88.57% -0.06%
==========================================
Files 57 57
Lines 3193 3203 +10
==========================================
+ Hits 2830 2837 +7
- Misses 363 366 +3
Continue to review full report at Codecov.
|
a3b0191
to
806881c
Compare
@@ -174,8 +174,7 @@ int main(int argc, char *argv[]) | |||
{ | |||
namespace boost_po = boost::program_options; | |||
|
|||
MPI_Init(&argc, &argv); | |||
dealii::MultithreadInfo::set_thread_limit(1); | |||
dealii::Utilities::MPI::MPI_InitFinalize mpi_init(argc, argv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that deal.II does not assert that the level of thread support available is sufficient (does not assert provided >= wanted
)
I have nothing against the change you suggest, even more so since this line is the only direct call to MPI_Init()
in mfmg, but I am curious if this actually was a bug. Can you expand on this?
edit @masterleinad I realized I had pasted the wrong link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For requesting explicit thread support we need to call MPI_Init_thread
instead which dealii::Utilities::MPI::MPI_InitFinalize
does (apart from initializing Zoltan and emptying memory pools).
MPI_Init
seems to not allow calling MPI
functions from multiple threads and hence behaves similar to MPI_Init_thread
with MPI_THREAD_SINGLE
(or possibly MPI_THREAD_FUNNELED
).
In the end, I was just seeing MPI
errors when calling MPI
functions (inside the MatrixFree
constructior to be precise) when running multithreaded. Using MPI_Init
instead of MPI_Init_thread
or dealii::Utilities::MPI::MPI_InitFinalize
was likely just an oversight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated link to deal.II code
AgglomerateOperator agglomerate_operator(*dealii_mesh_evaluator, | ||
agglomerate_dof_handler, | ||
agglomerate_constraints); | ||
agglomerate_operator.vmult(correction, delta_eig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look reasonable but are lost in all the code formatting mess.
{ | ||
ASSERT_THROW_NOT_IMPLEMENTED(); | ||
return std::make_unique<DealIIMatrixFreeMeshEvaluator>(*this); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remind me why this cannot be pure virtual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to discuss why this is OK.
My objections are: we force the user to implement to boiler plate code, we have no way of knowing how much data he stuffed into his derived class and this could be costly.
Also I would like to understand where the race condition happen :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be much easier if we have a uniform interface for global/agglomerate initialization and evaluation or separate user classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/usr/include/c++/7/ext/new_allocator.h:136:4: error: invalid new-expression of abstract class type ‘TestMeshEvaluator<mfmg::DealIIMatrixFreeMeshEvaluator<2> >’
{ ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from mfmg/tests/test_hierarchy.cc:37:0:
mfmg/tests/test_hierarchy_helpers.hpp:202:7: note: because the following virtual functions are pure within ‘TestMeshEvaluator<mfmg::DealIIMatrixFreeMeshEvaluator<2> >’:
class TestMeshEvaluator final : public MeshEvaluator
^~~~~~~~~~~~~~~~~
In file included from mfmg/include/mfmg/dealii/amge_host.hpp:16:0,
from mfmg/include/mfmg/dealii/dealii_hierarchy_helpers.hpp:16,
from mfmg/include/mfmg/common/hierarchy.hpp:19,
from mfmg/tests/test_hierarchy.cc:14:
mfmg/include/mfmg/dealii/dealii_matrix_free_mesh_evaluator.hpp:63:58: note: std::unique_ptr<mfmg::DealIIMatrixFreeMeshEvaluator<dim> > mfmg::DealIIMatrixFreeMeshEvaluator<dim>::clone() const [with int dim = 2]
virtual std::unique_ptr<DealIIMatrixFreeMeshEvaluator> clone() const = 0;
^~~~~
I am not sure if we need to modify the matrix-based version as well.
The matrix-based version needs more work, so I restricted the number of threads to one in that case again, To actually check the matrix-free version here, I changed the default as well to "matrix-free" as well (forcing a Chebyshev smoother since that is the only possibility). |
So the problem is that |
No, that is not the problem. So far, we had a common |
As compared to #192 changes in timings for running with one thread are unaffected. These are the run times on my notebook:
|
For the 4x4 patch and one thread, I get
while 8 threads give me
|
Using vectorization (AVX2) on top of that gives me (1 thread)
resp. (8 threads)
|
Using MPI instead of threads gives (1 MPI process)
resp. (8 MPI processes)
|
eb1b2ad
to
ab5bf1d
Compare
Does anyone want to discuss a different solution? |
@dalg24 said he was going to look at it |
@@ -12,6 +12,7 @@ | |||
#include <mfmg/common/instantiation.hpp> | |||
#include <mfmg/common/operator.hpp> | |||
#include <mfmg/dealii/amge_host.hpp> | |||
#include <mfmg/dealii/amge_host.templates.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment it is included for MatrixFreeAgglomerateOperator
and that it would probably not be a bad idea to move the definition elsewhere.
@@ -56,6 +56,16 @@ class DealIIMatrixFreeMeshEvaluator : public DealIIMeshEvaluator<dim> | |||
*/ | |||
virtual ~DealIIMatrixFreeMeshEvaluator() override = default; | |||
|
|||
/** | |||
* Create a deep copy of this class such that initializing on another | |||
* agglomerate works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a note that this was introduced because calls to member functions matrix_free_initialize_agglomerate()
and matrix_free_evaluate_agglomerate()
(implemented in user provided class deriving from MatrixFreeMeshEvaluator
) are not thread safe and that it is not the only option to solve the problem.
Thanks! |
Part of #185. This pull request allows
hierarchy_driver
to run multithreaded. The main modification is that we have to copy theEvaluator
for each agglomerate to avoid race conditions.I have no idea why the formatting in
include/mfmg/dealii/dealii_mesh_evaluator.hpp
changed that much.