Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve multithread support #188

Merged
merged 7 commits into from
Jun 10, 2019
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions include/mfmg/dealii/amge_host.templates.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,10 @@ struct MatrixFreeAgglomerateOperator
MatrixFreeAgglomerateOperator(MeshEvaluator const &mesh_evaluator,
DoFHandler &dof_handler,
dealii::AffineConstraints<double> &constraints)
: _mesh_evaluator(mesh_evaluator), _dof_handler(dof_handler),
: _mesh_evaluator(mesh_evaluator.clone()), _dof_handler(dof_handler),
_constraints(constraints)
{
_mesh_evaluator.matrix_free_initialize_agglomerate(dof_handler);
_mesh_evaluator->matrix_free_initialize_agglomerate(dof_handler);
}

/**
Expand All @@ -74,7 +74,7 @@ struct MatrixFreeAgglomerateOperator
void vmult(dealii::Vector<double> &dst,
dealii::Vector<double> const &src) const
{
_mesh_evaluator.matrix_free_evaluate_agglomerate(src, dst);
_mesh_evaluator->matrix_free_evaluate_agglomerate(src, dst);
}

/**
Expand All @@ -85,7 +85,7 @@ struct MatrixFreeAgglomerateOperator
*/
std::vector<double> get_diag_elements() const
{
return _mesh_evaluator.matrix_free_get_agglomerate_diagonal(_constraints);
return _mesh_evaluator->matrix_free_get_agglomerate_diagonal(_constraints);
}

/**
Expand All @@ -102,7 +102,7 @@ struct MatrixFreeAgglomerateOperator
/**
* The actual operator wrapped.
*/
MeshEvaluator const &_mesh_evaluator;
std::unique_ptr<MeshEvaluator const> const _mesh_evaluator;

/**
* The dimension for the underlying mesh.
Expand Down Expand Up @@ -462,6 +462,7 @@ void AMGe_host<dim, MeshEvaluator, VectorType>::setup_restrictor(
std::vector<std::vector<dealii::types::global_dof_index>> dof_indices_maps;
std::vector<unsigned int> n_local_eigenvectors;
CopyData copy_data;

dealii::WorkStream::run(
agglomerate_ids.begin(), agglomerate_ids.end(),
[&](std::vector<unsigned int>::iterator const &agg_id,
Expand Down Expand Up @@ -518,6 +519,7 @@ void AMGe_host<dim, MeshEvaluator, VectorType>::setup_restrictor(
std::vector<std::vector<dealii::types::global_dof_index>> dof_indices_maps;
std::vector<unsigned int> n_local_eigenvectors;
CopyData copy_data;

dealii::WorkStream::run(
agglomerate_ids.begin(), agglomerate_ids.end(),
[&](std::vector<unsigned int>::iterator const &agg_id,
Expand Down
10 changes: 10 additions & 0 deletions include/mfmg/dealii/dealii_matrix_free_mesh_evaluator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,16 @@ class DealIIMatrixFreeMeshEvaluator : public DealIIMeshEvaluator<dim>
*/
virtual ~DealIIMatrixFreeMeshEvaluator() override = default;

/**
* Create a deep copy of this class such that initializing on another
* agglomerate works.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a note that this was introduced because calls to member functions matrix_free_initialize_agglomerate() and matrix_free_evaluate_agglomerate() (implemented in user provided class deriving from MatrixFreeMeshEvaluator) are not thread safe and that it is not the only option to solve the problem.

*/
virtual std::unique_ptr<DealIIMatrixFreeMeshEvaluator> clone() const
{
ASSERT_THROW_NOT_IMPLEMENTED();
return std::make_unique<DealIIMatrixFreeMeshEvaluator>(*this);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remind me why this cannot be pure virtual.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to discuss why this is OK.

My objections are: we force the user to implement to boiler plate code, we have no way of knowing how much data he stuffed into his derived class and this could be costly.
Also I would like to understand where the race condition happen :/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be much easier if we have a uniform interface for global/agglomerate initialization and evaluation or separate user classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/usr/include/c++/7/ext/new_allocator.h:136:4: error: invalid new-expression of abstract class type ‘TestMeshEvaluator<mfmg::DealIIMatrixFreeMeshEvaluator<2> >’
  { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from mfmg/tests/test_hierarchy.cc:37:0:
mfmg/tests/test_hierarchy_helpers.hpp:202:7: note:   because the following virtual functions are pure within ‘TestMeshEvaluator<mfmg::DealIIMatrixFreeMeshEvaluator<2> >’:
 class TestMeshEvaluator final : public MeshEvaluator
       ^~~~~~~~~~~~~~~~~
In file included from mfmg/include/mfmg/dealii/amge_host.hpp:16:0,
                 from mfmg/include/mfmg/dealii/dealii_hierarchy_helpers.hpp:16,
                 from mfmg/include/mfmg/common/hierarchy.hpp:19,
                 from mfmg/tests/test_hierarchy.cc:14:
mfmg/include/mfmg/dealii/dealii_matrix_free_mesh_evaluator.hpp:63:58: note: 	std::unique_ptr<mfmg::DealIIMatrixFreeMeshEvaluator<dim> > mfmg::DealIIMatrixFreeMeshEvaluator<dim>::clone() const [with int dim = 2]
   virtual std::unique_ptr<DealIIMatrixFreeMeshEvaluator> clone() const = 0;
                                                          ^~~~~

I am not sure if we need to modify the matrix-based version as well.


/**
* Return the class name as std::string.
*/
Expand Down
164 changes: 83 additions & 81 deletions source/dealii/dealii_matrix_free_hierarchy_helpers.cc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <mfmg/common/instantiation.hpp>
#include <mfmg/common/operator.hpp>
#include <mfmg/dealii/amge_host.hpp>
#include <mfmg/dealii/amge_host.templates.hpp>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment it is included for MatrixFreeAgglomerateOperator and that it would probably not be a bad idea to move the definition elsewhere.

#include <mfmg/dealii/dealii_matrix_free_hierarchy_helpers.hpp>
#include <mfmg/dealii/dealii_matrix_free_mesh_evaluator.hpp>
#include <mfmg/dealii/dealii_matrix_free_operator.hpp>
Expand Down Expand Up @@ -83,6 +84,7 @@ DealIIMatrixFreeHierarchyHelpers<dim, VectorType>::build_restrictor(
std::unique_ptr<dealii::TrilinosWrappers::SparseMatrix> eigenvector_matrix;
std::unique_ptr<dealii::TrilinosWrappers::SparseMatrix>
delta_eigenvector_matrix;

amge.setup_restrictor(agglomerate_params, n_eigenvectors, tolerance,
*dealii_mesh_evaluator, locally_relevant_global_diag,
restrictor_matrix, eigenvector_matrix,
Expand Down Expand Up @@ -125,91 +127,91 @@ DealIIMatrixFreeHierarchyHelpers<dim, VectorType>::build_restrictor(
std::vector<std::vector<dealii::TrilinosScalar>> values_per_row;
} copy_data;

auto worker =
[&](const std::vector<std::vector<unsigned int>>::const_iterator
&agglomerate_it,
ScratchData &, CopyData &local_copy_data) {
dealii::Triangulation<dim> agglomerate_triangulation;
std::map<typename dealii::Triangulation<dim>::active_cell_iterator,
typename dealii::DoFHandler<dim>::active_cell_iterator>
patch_to_global_map;
amge.build_agglomerate_triangulation(*agglomerate_it,
agglomerate_triangulation,
patch_to_global_map);
if (patch_to_global_map.empty())
auto worker = [&](const std::vector<std::vector<unsigned int>>::
const_iterator &agglomerate_it,
ScratchData &, CopyData &local_copy_data) {
dealii::Triangulation<dim> agglomerate_triangulation;
std::map<typename dealii::Triangulation<dim>::active_cell_iterator,
typename dealii::DoFHandler<dim>::active_cell_iterator>
patch_to_global_map;
amge.build_agglomerate_triangulation(
*agglomerate_it, agglomerate_triangulation, patch_to_global_map);
if (patch_to_global_map.empty())
{
return;
}

// Now that we have the triangulation, we can do the evaluation on
// the agglomerate
dealii::DoFHandler<dim> agglomerate_dof_handler(
agglomerate_triangulation);
agglomerate_dof_handler.distribute_dofs(
dealii_mesh_evaluator->get_dof_handler().get_fe());

// Put the result in the matrix
// Compute the map between the local and the global dof indices.
local_copy_data.rows.resize(n_local_eigenvectors);

local_copy_data.cols = amge.compute_dof_index_map(
patch_to_global_map, agglomerate_dof_handler);
auto const &dof_indices_map = local_copy_data.cols;
unsigned int const n_elem = dof_indices_map.size();

// We need a clean reset for the values we are going to store.
// Otherwise, we would accumulate values across patches
// corresponding to different degrees of freedom.
local_copy_data.values_per_row.resize(n_local_eigenvectors);
std::fill(local_copy_data.values_per_row.begin(),
local_copy_data.values_per_row.end(),
std::vector<dealii::TrilinosScalar>(n_elem));

unsigned int const i = agglomerate_it - agglomerates_vector.begin();

for (unsigned int j = 0; j < n_local_eigenvectors; ++j)
{
unsigned int const local_row = i * n_local_eigenvectors + j;
unsigned int const global_row =
eigenvector_matrix->locally_owned_range_indices()
.nth_index_in_set(local_row);
// Get the vector used for the matrix-vector multiplication
dealii::Vector<ScalarType> delta_eig(n_elem);
if (is_halo_agglomerate)
{
for (unsigned int k = 0; k < n_elem; ++k)
{
return;
delta_eig[k] =
delta_eigenvector_matrix->el(global_row, dof_indices_map[k]) +
eigenvector_matrix->el(global_row, dof_indices_map[k]);
}

// Now that we have the triangulation, we can do the evaluation on
// the agglomerate
dealii::DoFHandler<dim> agglomerate_dof_handler(
agglomerate_triangulation);
agglomerate_dof_handler.distribute_dofs(
dealii_mesh_evaluator->get_dof_handler().get_fe());

// Put the result in the matrix
// Compute the map between the local and the global dof indices.
local_copy_data.rows.resize(n_local_eigenvectors);

local_copy_data.cols = amge.compute_dof_index_map(
patch_to_global_map, agglomerate_dof_handler);
auto const &dof_indices_map = local_copy_data.cols;
unsigned int const n_elem = dof_indices_map.size();

// We need a clean reset for the values we are going to store.
// Otherwise, we would accumulate values across patches
// corresponding to different degrees of freedom.
local_copy_data.values_per_row.resize(n_local_eigenvectors);
std::fill(local_copy_data.values_per_row.begin(),
local_copy_data.values_per_row.end(),
std::vector<dealii::TrilinosScalar>(n_elem));

unsigned int const i = agglomerate_it - agglomerates_vector.begin();

for (unsigned int j = 0; j < n_local_eigenvectors; ++j)
}
else
{
for (unsigned int k = 0; k < n_elem; ++k)
{
unsigned int const local_row = i * n_local_eigenvectors + j;
unsigned int const global_row =
eigenvector_matrix->locally_owned_range_indices()
.nth_index_in_set(local_row);
// Get the vector used for the matrix-vector multiplication
dealii::Vector<ScalarType> delta_eig(n_elem);
if (is_halo_agglomerate)
{
for (unsigned int k = 0; k < n_elem; ++k)
{
delta_eig[k] =
delta_eigenvector_matrix->el(global_row,
dof_indices_map[k]) +
eigenvector_matrix->el(global_row, dof_indices_map[k]);
}
}
else
{
for (unsigned int k = 0; k < n_elem; ++k)
{
delta_eig[k] = delta_eigenvector_matrix->el(
global_row, dof_indices_map[k]);
}
}

// Perform the matrix-vector multiplication
dealii::Vector<ScalarType> correction(n_elem);
dealii_mesh_evaluator->matrix_free_initialize_agglomerate(
agglomerate_dof_handler);
dealii_mesh_evaluator->matrix_free_evaluate_agglomerate(
delta_eig, correction);

// Store the values the delta correction matrix is to be filled
// with.
local_copy_data.rows[j] = global_row;
std::transform(correction.begin(), correction.end(),
local_copy_data.values_per_row[j].begin(),
local_copy_data.values_per_row[j].begin(),
std::plus<double>());
delta_eig[k] =
delta_eigenvector_matrix->el(global_row, dof_indices_map[k]);
}
};
}

// Perform the matrix-vector multiplication
dealii::Vector<ScalarType> correction(n_elem);
dealii::AffineConstraints<double> agglomerate_constraints;
using AgglomerateOperator =
MatrixFreeAgglomerateOperator<DealIIMatrixFreeMeshEvaluator<dim>>;
AgglomerateOperator agglomerate_operator(*dealii_mesh_evaluator,
agglomerate_dof_handler,
agglomerate_constraints);
agglomerate_operator.vmult(correction, delta_eig);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look reasonable but are lost in all the code formatting mess.


// Store the values the delta correction matrix is to be filled
// with.
local_copy_data.rows[j] = global_row;
std::transform(correction.begin(), correction.end(),
local_copy_data.values_per_row[j].begin(),
local_copy_data.values_per_row[j].begin(),
std::plus<double>());
}
};

auto copier = [&](const CopyData &local_copy_data) {
for (unsigned int i = 0; i < local_copy_data.rows.size(); ++i)
Expand Down
9 changes: 3 additions & 6 deletions tests/hierarchy_driver.cc
Original file line number Diff line number Diff line change
Expand Up @@ -217,8 +217,7 @@ int main(int argc, char *argv[])
{
namespace boost_po = boost::program_options;

MPI_Init(&argc, &argv);
dealii::MultithreadInfo::set_thread_limit(1);
dealii::Utilities::MPI::MPI_InitFinalize mpi_init(argc, argv);
Copy link
Collaborator

@dalg24 dalg24 May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that deal.II does not assert that the level of thread support available is sufficient (does not assert provided >= wanted)

I have nothing against the change you suggest, even more so since this line is the only direct call to MPI_Init() in mfmg, but I am curious if this actually was a bug. Can you expand on this?

edit @masterleinad I realized I had pasted the wrong link

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For requesting explicit thread support we need to call MPI_Init_thread instead which dealii::Utilities::MPI::MPI_InitFinalize does (apart from initializing Zoltan and emptying memory pools).

MPI_Init seems to not allow calling MPI functions from multiple threads and hence behaves similar to MPI_Init_thread with MPI_THREAD_SINGLE (or possibly MPI_THREAD_FUNNELED).

In the end, I was just seeing MPI errors when calling MPI functions (inside the MatrixFree constructior to be precise) when running multithreaded. Using MPI_Init instead of MPI_Init_thread or dealii::Utilities::MPI::MPI_InitFinalize was likely just an oversight.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated link to deal.II code


boost_po::options_description cmd("Available options");
cmd.add_options()("help,h", "produce help message");
Expand Down Expand Up @@ -250,7 +249,7 @@ int main(int argc, char *argv[])
dim = vm["dim"].as<int>();
mfmg::ASSERT(dim == 2 || dim == 3, "Dimension must be 2 or 3");

bool matrix_free = false;
bool matrix_free = true;
if (vm.count("matrix_free"))
matrix_free = vm["matrix_free"].as<bool>();

Expand All @@ -275,7 +274,6 @@ int main(int argc, char *argv[])
if (matrix_free)
{
params->put("smoother.type", "Chebyshev");

if (dim == 2)
{
switch (fe_degree)
Expand Down Expand Up @@ -419,13 +417,12 @@ int main(int argc, char *argv[])
}
else
{
dealii::MultithreadInfo::set_thread_limit(1);
if (dim == 2)
matrix_based_two_grids<2>(params);
else
matrix_based_two_grids<3>(params);
}

MPI_Finalize();

return 0;
}
2 changes: 2 additions & 0 deletions tests/main.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ bool init_function() { return true; }

int main(int argc, char *argv[])
{
// Set the maximum number of threads used to the minimum of the number of
// cores reported by TBB and the environment variable DEAL_II_NUM_THREADS.
dealii::Utilities::MPI::MPI_InitFinalize mpi_init(
argc, argv, dealii::numbers::invalid_unsigned_int);

Expand Down
12 changes: 11 additions & 1 deletion tests/test_hierarchy.cc
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,15 @@
#include "main.cc"
#include "test_hierarchy_helpers.hpp"

// In a lot of places in this file, we explicitly set the number of threads to
// use since some of the tests cases don't work with multiple threads yet. This
// works as follow: dealii::MultithreadInfo::set_thread_limit sets the number of
// threads to the minimum its the argument and the environment variable
// DEAL_II_NUM_THREADS. TBB is asked for the maximum nuber of cores if the
// minimum is dealii::numbers::invalid_unsigned_int. Consequently,
// MultithreadInfo::set_thread_limit(numbers::invalid_unsigned int) sets the
// number of threads to the ones specified in the environment variable.

namespace bdata = boost::unit_test::data;
namespace tt = boost::test_tools;

Expand Down Expand Up @@ -203,7 +212,8 @@ BOOST_AUTO_TEST_CASE(benchmark)

BOOST_AUTO_TEST_CASE(benchmark_mf)
{
dealii::MultithreadInfo::set_thread_limit(1);
dealii::MultithreadInfo::set_thread_limit(
dealii::numbers::invalid_unsigned_int);
masterleinad marked this conversation as resolved.
Show resolved Hide resolved

auto params = std::make_shared<boost::property_tree::ptree>();
boost::property_tree::info_parser::read_info("hierarchy_input.info", *params);
Expand Down
19 changes: 19 additions & 0 deletions tests/test_hierarchy_helpers.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -309,8 +309,27 @@ class TestMFMeshEvaluator final
{
}

// We need a copy constructor since the evaluator has to cloned for each
// agglomerate. Here, we just copy the minimum number of member variables
// required. In particular, we should not need to copy mutable members since
// they are set up in matrix_free_initialize_agglomerate only.
TestMFMeshEvaluator(
TestMFMeshEvaluator<dim, fe_degree, ScalarType> const &_other_evaluator)
: mfmg::DealIIMatrixFreeMeshEvaluator<dim>(*this),
_material_property(_other_evaluator._material_property),
_fe(_other_evaluator._fe),
_laplace_operator(_other_evaluator._laplace_operator)
{
}
masterleinad marked this conversation as resolved.
Show resolved Hide resolved

virtual ~TestMFMeshEvaluator() override = default;

virtual std::unique_ptr<mfmg::DealIIMatrixFreeMeshEvaluator<dim>>
clone() const override
{
return std::make_unique<TestMFMeshEvaluator>(*this);
}

virtual void
matrix_free_evaluate_agglomerate(dealii::Vector<double> const &src,
dealii::Vector<double> &dst) const override
Expand Down