-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
comm: create local_group/remote_group beform comm commit #7237
base: main
Are you sure you want to change the base?
Conversation
This test requires to access MPICH internals, thus won't be used with the current design.
We no longer use this file.
Hide the internal fields of MPIR_Group from unnecessary access. Outside group_util.c and group_impl.c, it only need assume the MPIR_Lpid integer type, creation routines based on lpid map or lpid stride description, and access routine to look up lpid from a group rank.
For most external usages, we only need MPIR_Group_rank_to_lpid.
Avoid access group internal fields.
Group similar functions together to facilitate refactoring. There is no changes in this commit other than moving functions around. The 4 incl/excl functions are very similar. The 3 difference/intersection/union functions are very similar.
Use MPIR_Group_{rank_to_lpid,lpid_to_rank} to avoid directly access MPIR_Group internal fields. For most group creation routines, just populate an lpid lookup map and call MPIR_Group_create_map to create the group.
* add option to use stride to describe group composition * remove the linked list design
This is the same as MPID_Comm_get_lpid. NOTE: we'll will remove MPID_Comm_get_lpid as well once we move the ownership of lpid to the MPIR-layer.
f3257ba
to
0cf5832
Compare
There is no real difference between lpid and gpid. Thus rename gpid in the device layer to lpid for clarification. Replace the usage of uint64_t as the type of lpid to MPIR_Lpid. This improves consistency.
b204a9a
to
acda531
Compare
We need a device-independent way of identifying processes. One way is to use the combination of (world_idx, world_rank). Thus, we need maintain a list of worlds so that the world_idx points to the world record. This may not fit in the concept of MPI group, but since the group need a ways of id processes, thus it seems most closely related. The first world, world_idx 0, is always initialized at init. Due to session re-init, we need make sure to reset num_worlds to 0 at finalize. New worlds will be added upon spawning or connecting dynamic processes (to-be-implemented).
Add builtin MPIR_GROUP_WORLD and MPIR_GROUP_SELF, so we can create builtin communicators from builtin groups.
Internally the only reason to duplicate a group is to copy from NULL session to a new session. Otherwise, we can just use the same group and increment the reference count.
Since builting groups can be returned to users, they should be allowed to free. They are reference counted anyway.
To make MPI group a first-class citizen, we will always have group before creating communicators, so that when device layer activate communiators, e.g. in MPID_Comm_commit_pre_hook, it can rely on the group to look up the involved processes. It also removes the necessity to maintain any other process addressing schems.
Many places we just return MPIR_Group_empty without increment the ref_count. This is fixable. But for now, let's avoid freeing it.
The init_comm does the release manually.
Add assertions to make sure the local_group and remote_group (for inter communicators) are always set before MPID_Comm_commit_pre_hook.
acda531
to
5a53a62
Compare
test:mpich/ch3/most All ✔️ |
test:mpich/ch4/ofi |
if (sizeof(MPIR_Lpid) == 8) { | ||
lpid_datatype = MPI_UINT64_T; | ||
} else { | ||
MPIR_Assert(sizeof(MPIR_Lpid) == 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case MPIR_Lpid
is defined as uint32_t
? I thought it is all uint64_t
since last #7235.
@@ -30,6 +30,9 @@ int MPIR_init_comm_world(void) | |||
MPIR_Process.comm_world->remote_size = MPIR_Process.size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the remote_group
is NULL
, the remote_size
should probably be 0.
Or, if we want to keep local_size == remote_size
for now to lessen the impact on existing codes, we should update the comment in the struct definition and maybe add a TODO for future cleanup.
@@ -30,6 +30,9 @@ int MPIR_init_comm_world(void) | |||
MPIR_Process.comm_world->remote_size = MPIR_Process.size; | |||
MPIR_Process.comm_world->local_size = MPIR_Process.size; | |||
|
|||
MPIR_Process.comm_world->local_group = MPIR_GROUP_WORLD_PTR; | |||
MPIR_Group_add_ref(MPIR_GROUP_WORLD_PTR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explicitly set remote_group
to NULL
to avoid uninitialized value.
@@ -494,6 +496,7 @@ int MPIR_Comm_create_inter(MPIR_Comm * comm_ptr, MPIR_Group * group_ptr, MPIR_Co | |||
|
|||
MPIR_Assert(remote_size >= 0); | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove blank line.
* to initialize the init_comm, e.g. to eliminate potential | ||
* runtime features for stability during init */ | ||
* to initialize the init_comm, e.g. to eliminate potential | ||
* runtime features for stability during init */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is changed here?
Pull Request Description
[Dependency PR #7235 ]
First there is only
MPI_COMM_WORLD
. Later we added dynamic processes, MPI groups, and lately, MPI sessions. Because these later concepts are not part of original vision, they implemented by hacking rather than by design. For example, instead of we locally create group then create communicator from group, we do it in reverse. We assume the communicator is always there -- anMPI_COMM_WORLD
which can later split and recombine -- and we derive groups from existing communicators. In the original design, all the process addressing system is based on communicators. It is a mess! The latest addition of MPI session throw a wrench to this mess because now we have a situation that communicators are not always there.The current situation:
comm
uses mapper -- an address systems based parent communicatorsMPIR_Comm_map_t
MPIDI_VCRT
table for each communicator based on the mappervcrt
in the dup caseMPIDI_rank_map_t
which refers to a globalavt_mgr
(av table manager)lpid
, accessed usingMPID_Comm_get_lpid
lpid
sThis convoluted mess is because we designed
lpid
to be device-layer opaque and mysterious. Within the current upstream code base, we have 4 address systems -mapper
VCRT
And we are about to add "MPI Session PSET", yet another address system
I propose to unite all into a single system and make MPI Group first-class citizen.
We can use a universal address system using (
world_idx
,world_rank
) combination.In the hind sight, we should design it the session way --
init
discover the worldThe PR tries to do just that.
[skip warnings]
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.