MPI_ERRORS_RETURN #13

VictorEijkhout · 2020-09-18T14:45:42Z

My program bombs with

Fatal error in MPI_Waitall: See the MPI_ERROR field in MPI_Status for the error code

Undoubtedly a programming error by me. But I can not query that status because the code has exited.
What is the MPL equivalent of MPI_Comm_set_errhandler(comm,MPI_ERRORS_RETURN)?

The text was updated successfully, but these errors were encountered:

rabauke · 2020-09-18T19:15:39Z

MPL is nothing but a light-weight wrapper around MPI. This means, you can actually use the MPI function MPI_Comm_set_errhandler (at least with MPI_COMM_WORLD).

(MPL has currently no wrapper for error handling. Because mpl::communicator encapsulates the MPI communicator, you will not be able to access this MPI communicator directly.)

VictorEijkhout · 2020-09-18T22:10:54Z

Maybe add a method that extracts the MPI_Comm from your communciator object? There's quite a number of routines in MPI that attach godknowswhat to a communicator.
The request::error method returns the numerical error. I can of course convert that with MPI_Error_string to a char* but that feels un-C++. Is there a routine that gives me the error text as a std::string?
What does error::what do?

rabauke · 2020-09-27T16:50:27Z

@VictorEijkhout The MPI standard provides rather week guaranties when it comes to error handling. Therefore, I did not pay much attention on this topic when creating the MPL library on top of MPI. There is certainly room for improvements.

I thought about how MPI's error handling concepts can be incorporated into a nice C++ library. I tried to wrap MPI_Comm_set_errhandler and related functions. I think, however, the whole concept of error handlers feels very unideomatic for a C++ library. In some sense error handlers are a C way of exception handling.

Currently, I plan to implement therefore the following:

All MPI communicators are attached to MPI_ERRORS_RETURN.
MPL will always check internally the return values of all calls to MPI functions. In case of an error an exception will be thrown. I.e., MPL function do not return error codes, no API changes to exciting MPL functions.
The exception type will derive from std::exception.
The exception's what method will return a string given by MPI_Error_string.
MPL will not provide equivalents to MPI_Comm_set_errhandler and related functions.

VictorEijkhout · 2020-09-27T20:24:23Z

Your plan re: error handling sound good to me.

My only comment: there is a bunch of other stuff (attribute, name, parent, info) that is attached to a communicator. I'd still appreciate a routine that gives me direct access to the communicator.

rabauke · 2020-09-28T18:40:01Z

In principle, one could in include information about the communicator into the exception that is thrown when an error occurs, e.g., via a reference or a pointer to the respective communicator. This approach, however, becomes problematic if the communicator is created within a try-catch block and an error occurs for this communicator. Due to stack unwinding the communicator is destroyed before the exception is caught. This means, the programmer must place its try-catch-blocks with care with such an approach.

Furthermore, the MPI standard states

After an error is detected, the state of MPI is undefined. That is, using a user-defined error handler, or MPI_ERRORS_RETURN, does not necessarily allow the user to continue to use MPI after an error is detected. The purpose of these error handlers is to allow a user to issue user-defined error messages and to take actions unrelated to MPI (such as flushing I/O buffers) before a program exits. An MPI implementation is free to allow MPI to continue after an error but is not required to do so.

This means there is no guarantee that one can actually access any of the data (attribute, name, parent, info) that is attached to a communicator after an error has occured.

May be I have to think about this more carefully.

VictorEijkhout · 2023-07-19T13:56:59Z

Your last reply is solely in the context of error handling. There are many other attributes that can be attached to communicators (and files, windows, probably more) that have nothing to do with error handling and that I think ought to be supported. At some point.

rabauke · 2023-07-31T19:15:30Z

@VictorEijkhout: There is an info class in MPL since some time, see https://github.com/rabauke/mpl/blob/master/mpl/info.hpp Furthermore, the communicator class has two infomethods for setting and getting an info object. Various, communicator creation routines employ an info object. Does this fulfill your needs for attaching attributes to communicators?

VictorEijkhout · 2023-08-04T16:53:39Z

There is MPI_Comm_set_attr, MPI_Comm_set_errhandler, MPI_Comm_set_name which I don't think you can currently support.

I'm not sure that they are very urgent. For instance, I mostly use MPI_Comm_get_attr for the upperbound on tags, which you support in the tag class.

rabauke self-assigned this Nov 29, 2020

rabauke added the api design label Nov 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI_ERRORS_RETURN #13

MPI_ERRORS_RETURN #13

VictorEijkhout commented Sep 18, 2020

rabauke commented Sep 18, 2020

VictorEijkhout commented Sep 18, 2020

rabauke commented Sep 27, 2020 •

edited

Loading

VictorEijkhout commented Sep 27, 2020

rabauke commented Sep 28, 2020 •

edited

Loading

VictorEijkhout commented Jul 19, 2023

rabauke commented Jul 31, 2023 •

edited

Loading

VictorEijkhout commented Aug 4, 2023

MPI_ERRORS_RETURN #13

MPI_ERRORS_RETURN #13

Comments

VictorEijkhout commented Sep 18, 2020

rabauke commented Sep 18, 2020

VictorEijkhout commented Sep 18, 2020

rabauke commented Sep 27, 2020 • edited Loading

VictorEijkhout commented Sep 27, 2020

rabauke commented Sep 28, 2020 • edited Loading

VictorEijkhout commented Jul 19, 2023

rabauke commented Jul 31, 2023 • edited Loading

VictorEijkhout commented Aug 4, 2023

rabauke commented Sep 27, 2020 •

edited

Loading

rabauke commented Sep 28, 2020 •

edited

Loading

rabauke commented Jul 31, 2023 •

edited

Loading