-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of the current state of MPI Continuations proposal [WIP] #9502
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the impact of the continuations on the latency ? What if we mix normal requests and continuations in the same app ?
I pushed a bunch of changes. The biggest change is that a continuation request is now derived from |
Oh, and I will squash everything down to one commit once we're done... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitively much cleaner.
ompi/mpiext/continue/configure.m4
Outdated
|
||
# This example can always build, so we just execute $1 if it was | ||
# requested. | ||
AS_IF([test "$ENABLE_continue" = "1" || \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These IFs looks similar enough to be merged.
2f49fa2
to
fa149c2
Compare
@bosilca I pushed a few changes:
|
int ompi_continue_progress_request(ompi_request_t *cont_req); | ||
|
||
/** | ||
* Attach a continuation to a set of operations represented by \c requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this \c ?
|
||
/** | ||
* Register the provided continuation request to be included in the | ||
* global progress loop (used while a thread is waiting for the contnuation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo. contnuation
|
||
opal_atomic_unlock(&cont_req->cont_lock); | ||
|
||
if (NULL == thread_progress_list) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this global object not protected for concurrent accesses ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a thread_local variable that stores the continuation requests the current thread is waiting on
0fffb9c
to
e014717
Compare
Hello! The Git Commit Checker CI bot found a few problems with this PR: d80d3ca: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
03c36a5
to
e2979db
Compare
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
5 similar comments
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
df79651
to
c8c0437
Compare
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
3 similar comments
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Hello! The Git Commit Checker CI bot found a few problems with this PR: f3babb5: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
a02d677
to
b6440c0
Compare
Hello! The Git Commit Checker CI bot found a few problems with this PR: a32e7e3: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
2 similar comments
Hello! The Git Commit Checker CI bot found a few problems with this PR: a32e7e3: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Hello! The Git Commit Checker CI bot found a few problems with this PR: a32e7e3: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Explicit global progress instead of global progress when progressing a request, to avoid recursive progress calls. Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Also: always store continuations directly in poll-only CRs. Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
- Swap flags and max_poll parameter in MPI_Continue_init - Swap cont_req and info parameter in MPI_Continue_init - Add new flags and remove support for info keys mpi_continue_poll_only, mpi_continue_max_poll, and mpi_continue_enqueue_complete Signed-off-by: Joseph Schuchart <[email protected]>
This was a temporary bandaid to fix the C11 sequentially consistent store... Signed-off-by: Joseph Schuchart <[email protected]>
Continuation requests may be marked as complete before another continuation is registered. In that case, we should mark the continuation request as pending again, unless a sync object has been registered already. Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
The cont->cont_req field was reset *after* it was returned to the freelist... Signed-off-by: Joseph Schuchart <[email protected]>
We must protect the activation of the completion of a continuation request to ensure that we don't miss any updates. Signed-off-by: Joseph Schuchart <[email protected]>
Runnable continuations should be added to cont_complete_defer_list if the continuation request is inactive, so that it can be enqueued for execution once the request is enabled. Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
This was a temporary idea that has been removed again. Signed-off-by: Joseph Schuchart <[email protected]>
MPIX_CONT_PERSISTENT will not be part of the first proposal and has been removed. Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
790a7b6
to
68b887c
Compare
Hello! The Git Commit Checker CI bot found a few problems with this PR: d77ca1c: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
It is not sufficient to just set one value, we need proper initialization through OMPI_REQUEST_INIT. Signed-off-by: Joseph Schuchart <[email protected]>
Hello! The Git Commit Checker CI bot found a few problems with this PR: d77ca1c: Remove re-iteration of continuation requests in te...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
I found something that may need more clarification. The continuation proposal says
It seems to me that the MPI runtime should set the status of the continuation request to This difference matters because it decides whether the following code logic is correct: if my application never tested (and then re-started) the continuation request (imagining I have some magic way to keep making progress on the runtime e.g., testing other regular requests), could I keep attaching new continues to it and assume they always get executed without delay? |
This PR adds an implementation of the MPI Continuations proposal as extension to Open MPI. This proposal is current under discussion in the MPI hybrid and accelerator working group. By integrating it as an extension into Open MPI, I am hoping to provide an up-to-date implementation for the community to experiment with. It reflects the current state of the proposal and will be kept in sync with the evolving proposal.
The implementation is mostly confined to the extension itself, with the exception of hooks in request test and wait functions used to allow polling on a continuation request to complete outstanding continuations. This is needed for the implementation of the
"mpi_continue_poll_only"
info key (see the description below).Note: this PR is currently WIP as it relies on a workaround to define
OMPI_HAVE_MPI_EXT_CONTINUE
, which is used to disable the hooks in request test/wait functions if the extension is not enabled. The underlying issue is that the extensions integration currently does not support including thempiext.h
header in implementation files (needed to disable the hooks mentioned above).Overview of MPI Continuations
Continuations provide a mechanism for attaching callbacks to outstanding operation requests. A call to
MPIX_Continue
takes a request, a function pointer, a user-provided data pointer, and a status object (orMPI_STATUS_IGNORE
), along with a continuation request and attaches the continuation to the operation request:The ownership of non-persistent requests is returned to MPI and the pointer to the request will be set to
MPI_REQUEST_NULL
. The callback is passed the status pointer and the user-provided data pointer:The status has to remain valid until the invocation of the callback and is set according to the operation before the callback is invoked.
The continuation is registered with the provided continuation request. The continuation request is a request allocated using
MPIX_Continue_init
:Continuation requests may be used to test/wait for completion of all continuations registered with it using
MPI_Test/Wait
. Supported info keys are:"mpi_continue_poll_only"
: if true, only execute continuations whenMPI_Test/Wait
is called on the continuation request. If false, continuations may be executed at any time inside a call into MPI (inside a callback registered withopal_progress
in the implementation; default: false)."mpi_continue_enqueue_complete"
: if false, the continuation is executed immediately if the operations are already complete when MPIX_Continue is called. Execution is deferred otherwise (default: false)"mpi_continue_max_poll"
: the maximum number of continuations to execute when callingMPI_Test
on the continuation request (default: -1, meaning unlimited)A continuation may in turn be attached to a continuation request, in which case it will be executed once all continuations registered with the continuation request have completed.
In addition to
MPIX_Continue
, the proposal also includesMPIX_Continueall
which attaches a continuation to a set of requests such that the continuation is executed once all operations have completed.Signed-off-by: Joseph Schuchart [email protected]
Signed-off-by: George Bosilca [email protected]