Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_ERRORS_RETURN #13

Open
VictorEijkhout opened this issue Sep 18, 2020 · 8 comments
Open

MPI_ERRORS_RETURN #13

VictorEijkhout opened this issue Sep 18, 2020 · 8 comments
Assignees

Comments

@VictorEijkhout
Copy link

My program bombs with

Fatal error in MPI_Waitall: See the MPI_ERROR field in MPI_Status for the error code

Undoubtedly a programming error by me. But I can not query that status because the code has exited.
What is the MPL equivalent of MPI_Comm_set_errhandler(comm,MPI_ERRORS_RETURN)?

@rabauke
Copy link
Owner

rabauke commented Sep 18, 2020

MPL is nothing but a light-weight wrapper around MPI. This means, you can actually use the MPI function MPI_Comm_set_errhandler (at least with MPI_COMM_WORLD).

(MPL has currently no wrapper for error handling. Because mpl::communicator encapsulates the MPI communicator, you will not be able to access this MPI communicator directly.)

@VictorEijkhout
Copy link
Author

  1. Maybe add a method that extracts the MPI_Comm from your communciator object? There's quite a number of routines in MPI that attach godknowswhat to a communicator.
  2. The request::error method returns the numerical error. I can of course convert that with MPI_Error_string to a char* but that feels un-C++. Is there a routine that gives me the error text as a std::string?
  3. What does error::what do?

@rabauke
Copy link
Owner

rabauke commented Sep 27, 2020

@VictorEijkhout The MPI standard provides rather week guaranties when it comes to error handling. Therefore, I did not pay much attention on this topic when creating the MPL library on top of MPI. There is certainly room for improvements.

I thought about how MPI's error handling concepts can be incorporated into a nice C++ library. I tried to wrap MPI_Comm_set_errhandler and related functions. I think, however, the whole concept of error handlers feels very unideomatic for a C++ library. In some sense error handlers are a C way of exception handling.

Currently, I plan to implement therefore the following:

  • All MPI communicators are attached to MPI_ERRORS_RETURN.
  • MPL will always check internally the return values of all calls to MPI functions. In case of an error an exception will be thrown. I.e., MPL function do not return error codes, no API changes to exciting MPL functions.
  • The exception type will derive from std::exception.
  • The exception's what method will return a string given by MPI_Error_string.
  • MPL will not provide equivalents to MPI_Comm_set_errhandler and related functions.

@VictorEijkhout
Copy link
Author

Your plan re: error handling sound good to me.

My only comment: there is a bunch of other stuff (attribute, name, parent, info) that is attached to a communicator. I'd still appreciate a routine that gives me direct access to the communicator.

@rabauke
Copy link
Owner

rabauke commented Sep 28, 2020

In principle, one could in include information about the communicator into the exception that is thrown when an error occurs, e.g., via a reference or a pointer to the respective communicator. This approach, however, becomes problematic if the communicator is created within a try-catch block and an error occurs for this communicator. Due to stack unwinding the communicator is destroyed before the exception is caught. This means, the programmer must place its try-catch-blocks with care with such an approach.

Furthermore, the MPI standard states

After an error is detected, the state of MPI is undefined. That is, using a user-defined error handler, or MPI_ERRORS_RETURN, does not necessarily allow the user to continue to use MPI after an error is detected. The purpose of these error handlers is to allow a user to issue user-defined error messages and to take actions unrelated to MPI (such as flushing I/O buffers) before a program exits. An MPI implementation is free to allow MPI to continue after an error but is not required to do so.

This means there is no guarantee that one can actually access any of the data (attribute, name, parent, info) that is attached to a communicator after an error has occured.

May be I have to think about this more carefully.

@rabauke rabauke self-assigned this Nov 29, 2020
@VictorEijkhout
Copy link
Author

Your last reply is solely in the context of error handling. There are many other attributes that can be attached to communicators (and files, windows, probably more) that have nothing to do with error handling and that I think ought to be supported. At some point.

@rabauke
Copy link
Owner

rabauke commented Jul 31, 2023

@VictorEijkhout: There is an info class in MPL since some time, see https://github.com/rabauke/mpl/blob/master/mpl/info.hpp Furthermore, the communicator class has two infomethods for setting and getting an info object. Various, communicator creation routines employ an info object. Does this fulfill your needs for attaching attributes to communicators?

@VictorEijkhout
Copy link
Author

There is MPI_Comm_set_attr, MPI_Comm_set_errhandler, MPI_Comm_set_name which I don't think you can currently support.

I'm not sure that they are very urgent. For instance, I mostly use MPI_Comm_get_attr for the upperbound on tags, which you support in the tag class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants