datatype: map builtin MPI datatypes to internal types #7264

hzhou · 2025-01-15T05:49:58Z

Pull Request Description

Many of the MPI Datatypes are redundant, for example, MPI_INT, MPI_INT32_t, MPI_INTEGER. The new ABI proposal requires setting some of these types at runtime, for example, MPI_Abi_set_fortran_info. In this PR, we create a set of fixed-size internal datatypes and then map all external builtin types to internal ones. This allows runtime setting and resetting any builtin types. And internally, we only need to support a fixed set.

MPI_Datatypes are defined by MPI Forum. It mixes alias types, different languages, and unnamed types.
C types are defined by C language. It contains aliases, mixed availability, and compiler variations.
Internal types are defined by MPICH. It is stable and deterministic. Can be used in a switch cases.
The handle bits works with current bit logic for builtin and type size, and also contains index to user-input type.

Discussion

For communication routines, such as MPI_Send, MPI_Bcast, we can replace builtin datatypes with MPIR_FIXED#, since all we care is the data size.
For reduction routines, such as MPI_Reduce, MPI_Accumulate, we can replace builtin datatypes with internal types (but not MPIR_FIXED#)
For datatype creation routines, I think we can replace all builtin "oldtype" with MPIR_FIXED#. We don't need worry about reduction op because we'll always rely on user op for them.
NOTE: once we convert to internal types, we'll not be able to perform strict type matching validation, such as matching MPI_INT to MPI_FLOAT. But we don't perform such validation today anyway. It is an extra overhead that we can't afford. In principle, we could perform strict type matching under e.g. --enable-error-checking=2

[skip warnings]

Author Checklist

Provide Description
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form: module: short description
Commit message explains what's in the commit.
Passes All Tests
Whitespace checker. Warnings test. Additional tests via comments.
Contribution Agreement
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.

configure.ac

src/include/mpir_datatype.h

src/mpi/datatype/typeutil.c

configure.ac

src/include/mpi.h.in

dalcinl · 2025-01-22T04:49:19Z

@hzhou Is Intel 32bits (i386) something you do not care at all by now? If not, maybe adding a MPIR_FLOAT12 to support long double may be not that hard.

hzhou · 2025-01-22T15:20:49Z

@hzhou Is Intel 32bits (i386) something you do not care at all by now? If not, maybe adding a MPIR_FLOAT12 to support long double may be not that hard.

We still want to support i386. I am not sure about MPIR_FLOAT12. "long double" is 16-byte, 80-bit, 4-byte alignment on i386. I think we need to make it into its own category, e.g. just call it MPIR_LONG_DOUBLE.

dalcinl · 2025-01-22T17:39:39Z

"long double" is 16-byte

IIRC, "long double" is 12 bytes on on i386 (at least on Linux, but not Windows).
That's why I proposed MPIR_FLOAT12. This type would be NULL on x86_64.
However, MPIR_LONG_DOUBLE would work totally fine, maybe even better, and it meaning would just be "whatever the platform's C long double is", so there is no ambiguity.

hzhou · 2025-01-22T18:08:29Z

Here is my current scheme:

MPIR_FLOAT[2,4,8,16]  /* for IEEE 754 floating points */
MPIR_FLOAT_ALT[2,12,16] /* for alternative format, e.g. BFLOAT16, long double on i386 and x86_64 */

Hopefully we only ever support 1 alternative format. But if more is needed, we'll need new category names.

As you pointed out that long double may have different sizes depend on platforms, but my design goals is to have fixed handle bits and semantics for internal types.

dalcinl · 2025-01-23T06:19:43Z

MPIR_FLOAT[2,4,8,16] /* for IEEE 754 floating points /
MPIR_FLOAT_ALT[2,12,16] / for alternative format, e.g. BFLOAT16, long double on i386 and x86_64 */

This is great for the time being!

hzhou · 2025-02-04T16:15:30Z

test:mpich/ch3/most
test:mpich/ch4/most

* in yaksa, map internal types to yaksa builtin types * use internal types in MPIR_Datatype_builtintype_alignment

* Replace e.g. MPI_INT with MPIR_INT_INTERNAL. * Use new groups in MPIR_OP_TYPE_GROUP.

Directly use fi_datatype and fi_op to index MPIDI_OFI_global.win_op_table and dtypes_max_count in MPIDI_OFI_win_acc_hint_t. We'll directly convert MPI datatypes and MPI ops to fi_datatype and fi_op.

At device layer we only need deal with internal datatypes.

The external builtin datatypes, e.g. MPI_INT, may be reconfigured at runtime. This won't be the case practically, but it is possibility by design, so that all MPI builtin datatypes, MPI_INT or MPI_INTEGER, are treated the same.

It is unnecessary. And internal types don't have corresponding datatype structures.

Refactor MPIR_Type_match_size_impl and support MPIX_TYPECLASS_LOGICAL. NOTE: now the fixed-width types are always available, we only need match one of fixed-width types. Check whether reduction op is available in case for example we don't have a matching C native type.

* External32 format converts types the original types rather than the internal types. * The error reporting need report to user the original datatypes rather than the internal ones. * Reduce_local need call user op function with the original datatypes.

Update for the new internal pairtypes.

The MPI standard didn't list MPI_BYTE as a valid type for MPI_MAX. However, I bet any coder would think it is a sensible to compare max/min of two byte values. Thus, we (mpich implementation) will allow it. Modify the test to check the error case of MPI_FLOAT and MPI_LAND instead.

Add a missing newline in error messages in reduce_local.c. Limit the number of error messages in atomic_rmw_cas.c.

This has been passed and merged by MPI Forum. Assuming the next MPI standard will be ratified before next major mpich release, we are directly using the MPI_ prefix rather than MPIX_ prefix. Also add MPI_TYPECLASS_LOGICAL for MPI_Type_match_size API. reference: mpi-forum/mpi-issues#699 mpi-forum/mpi-standard#963

This serves as an example how we add a new builtin mpi datatype. 1. define the constant in mpi.h.in 2. (optional) define the internal datatype in mpir_datatype.h if there isn't one already 2a. add alignment in MPIR_Datatype_builtintype_alignment 2b. add mapping in MPII_Typerep_get_yaksa_type 3. define the mapping in configure.ac 4. (optional) define case for the supported reduction op

Provide half-precision float sum operation by casting to C float.

Create MPIR_op_dt_check for reduction op-type validation in the binding-layer, where datatype is user-input mpi datatypes, and use MPIR_Internal_op_dt_check for op-type check where datatype is an internal datatype. The binding-layer validation follows the text literally from the MPI standard, while the internal checks are more flexible. For example, internally byte/char is an integer and are allowed for all integer operations. But externally, users only can use byte for bit-logic operations and char is invalid for any reduction op.

With internal types, the matched_datatype if matched will always be good for op. Thus the check_dtype is no longer necessary.

We centralized the the op/type check with MPIR_op_dt_check and MPIR_Internal_op_dt_check.

Now each op functions are quite simple and share a lot of commonalities, it is easier to maintain moving them into a single source file.

MPI standard does not list MPI_CHAR as a valid type for reduction using builtin ops. Also, multi language types such as MPI_AINT are not allowed in logical ops.

Allowing MPI_CHAR with op or MPI_AINT etc. with logical op are non-standard. However, MPICH used to allow it. Thus, this commit preserves the old behavior.

hzhou · 2025-02-04T20:26:19Z

test:mpich/ch3/most
test:mpich/ch4/most

hzhou mentioned this pull request Jan 15, 2025

mpi: Add MPI_LOGICAL{1,2,4,8,16} and MPI_TYPECLASS_LOGICAL #6949

Closed

4 tasks