Skip to content

Mixing UCX pml and osc performance degradation #9080

Open
@devreal

Description

@devreal

I have been working on an experimental extension to MPI that is built on top of osc/ucx. Unfortunately, during experiments with PaRSEC I found that the combination of osc/ucx and pml/ucx leads to a significant performance degradation. I have not been able to reproduce this with a small example but I have been able to observe this behavior in a modified PaRSEC runtime where the data payload is transferred through both P2P and RMA operations.

In the PaRSEC runtime, a single thread manages the communication of all task inputs/outputs by sending active messages and posting nonblocking sends and receives. The active messages may potentially lead to a high number of unexpected messages. In the below figure, this is PaRSEC vanilla. A certain loss in per-node performance in such a relatively small Cholesky factorization with small tiles is expected as more nodes require more active messages and the communication becomes dominant.

I have modified the runtime such that the sent payload is also put into a preallocated window, essentially doubling the amount of data transferred (PaRSEC P2P+RMA). This leads to a significant drop in performance at 64 nodes when using pml/ucx. Inte With pml/ob1 (probably using btl/tcp) the base line is much lower but the slope matches that of vanilla PaRSEC, outperforming pml/ucx at 64 nodes.

To make sure this is not a bandwidth issue, I used a second modified version in which the data is sent twice using send/recv (PaRSEC 2xP2P). In this case, the performance is mostly similar to the vanilla runtime. This leads me to conclude that this is not a bandwidth issue.

dpotrf_1222148-1

I had manually instrumented osc/ucx to warn about long calls into UCX and found that calls to ucp_put_nbi repeatedly took several milliseconds (up to approx. 20ms for a single call), which is likely the reason for the performance drop as other processes are starving while one is stuck in a put. However, I would like to know what is happening there as I want to combine P2P and RMA going forward. My theory of the case is that somehow the unexpected messages are handled inside the call to put. Interestingly though, the effect seems to be worse when the transfer size is increased (e.g., by increasing the tile size from 128 to 320). Any idea how I can find out what is happening inside UCX?

The system under test is a HPE Apollo with dual-socket AMD Rome 64 core processors with a HDR200 interconnect. Everything is compiled using GCC 10.2.0.

This was tested with a recent Open MPI master (two weeks ago, I have not seen any changes that may have had an impact here). I have observed similar behavior with 4.1.1.

UCX 1.10.0 configuration:

$ ~/opt-hawk/ucx-1.10.0/bin/ucx_info -d
#
# Memory domain: mlx5_1
#     Component: ib
#             register: unlimited, cost: 180 nsec
#           remote key: 8 bytes
#           local memory handle is required for zcopy
#
#      Transport: rc_verbs
#         Device: mlx5_1:1
#  System device: 0000:a3:00.0 (0)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 75 nsec
#            put_short: <= 124
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 8 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 8 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 123
#             am_bcopy: <= 8255
#             am_zcopy: <= 8255, up to 7 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 127
#               domain: device
#           atomic_add: 64 bit
#          atomic_fadd: 64 bit
#         atomic_cswap: 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 11 bytes
#           ep address: 17 bytes
#       error handling: peer failure
#
#
#      Transport: rc_mlx5
#         Device: mlx5_1:1
#  System device: 0000:a3:00.0 (0)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 14 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 14 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 186
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 11 bytes
#           ep address: 7 bytes
#       error handling: buffer (zcopy), remote access, peer failure
#
#
#      Transport: dc_mlx5
#         Device: mlx5_1:1
#  System device: 0000:a3:00.0 (0)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 660 nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 11 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 11 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 138
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 11 bytes
#        iface address: 5 bytes
#       error handling: buffer (zcopy), remote access, peer failure
#
#
#      Transport: ud_verbs
#         Device: mlx5_1:1
#  System device: 0000:a3:00.0 (0)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 105 nsec
#             am_short: <= 116
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 7 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 3952
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 11 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure
#
#
#      Transport: ud_mlx5
#         Device: mlx5_1:1
#  System device: 0000:a3:00.0 (0)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 80 nsec
#             am_short: <= 180
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 132
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 11 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure
#
#
# Memory domain: mlx5_0
#     Component: ib
#             register: unlimited, cost: 180 nsec
#           remote key: 8 bytes
#           local memory handle is required for zcopy
#
#      Transport: rc_verbs
#         Device: mlx5_0:1
#  System device: 0000:43:00.0 (1)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 75 nsec
#            put_short: <= 124
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 8 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 8 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 123
#             am_bcopy: <= 8255
#             am_zcopy: <= 8255, up to 7 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 127
#               domain: device
#           atomic_add: 64 bit
#          atomic_fadd: 64 bit
#         atomic_cswap: 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 11 bytes
#           ep address: 17 bytes
#       error handling: peer failure
#
#
#      Transport: rc_mlx5
#         Device: mlx5_0:1
#  System device: 0000:43:00.0 (1)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 14 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 14 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 186
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 11 bytes
#           ep address: 7 bytes
#       error handling: buffer (zcopy), remote access, peer failure
#
#
#      Transport: dc_mlx5
#         Device: mlx5_0:1
#  System device: 0000:43:00.0 (1)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 660 nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 11 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 11 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 138
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 11 bytes
#        iface address: 5 bytes
#       error handling: buffer (zcopy), remote access, peer failure
#
#
#      Transport: ud_verbs
#         Device: mlx5_0:1
#  System device: 0000:43:00.0 (1)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 105 nsec
#             am_short: <= 116
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 7 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 3952
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 11 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure
#
#
#      Transport: ud_mlx5
#         Device: mlx5_0:1
#  System device: 0000:43:00.0 (1)
#
#      capabilities:
#            bandwidth: 23588.47/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 80 nsec
#             am_short: <= 180
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 132
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 11 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure
#
#
# Memory domain: rdmacm
#     Component: rdmacm
#           supports client-server connection establishment via sockaddr
#   < no supported devices found >
#
# Connection manager: rdmacm
#      max_conn_priv: 54 bytes
#
# Memory domain: cma
#     Component: cma
#             register: unlimited, cost: 9 nsec
#
#      Transport: cma
#         Device: memory
#  System device: <unknown>
#
#      capabilities:
#            bandwidth: 0.00/ppn + 11145.00 MB/sec
#              latency: 80 nsec
#             overhead: 400 nsec
#            put_zcopy: unlimited, up to 16 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_zcopy: unlimited, up to 16 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 4 bytes
#       error handling: none
#
$ ~/opt-hawk/ucx-1.10.0/bin/ucx_info -b
#define UCX_CONFIG_H              
#define ENABLE_ASSERT             1
#define ENABLE_BUILTIN_MEMCPY     1
#define ENABLE_DEBUG_DATA         0
#define ENABLE_MT                 1
#define ENABLE_PARAMS_CHECK       1
#define ENABLE_STATS              1
#define ENABLE_SYMBOL_OVERRIDE    1
#define HAVE_ALLOCA               1
#define HAVE_ALLOCA_H             1
#define HAVE_ATTRIBUTE_NOOPTIMIZE 1
#define HAVE_CLEARENV             1
#define HAVE_CPU_SET_T            1
#define HAVE_DC_EXP               1
#define HAVE_DECL_ASPRINTF        1
#define HAVE_DECL_BASENAME        1
#define HAVE_DECL_CPU_ISSET       1
#define HAVE_DECL_CPU_ZERO        1
#define HAVE_DECL_ETHTOOL_CMD_SPEED 1
#define HAVE_DECL_FMEMOPEN        1
#define HAVE_DECL_F_SETOWN_EX     1
#define HAVE_DECL_IBV_ACCESS_ON_DEMAND 1
#define HAVE_DECL_IBV_ACCESS_RELAXED_ORDERING 0
#define HAVE_DECL_IBV_ADVISE_MR   0
#define HAVE_DECL_IBV_ALLOC_DM    0
#define HAVE_DECL_IBV_ALLOC_TD    0
#define HAVE_DECL_IBV_CMD_MODIFY_QP 1
#define HAVE_DECL_IBV_CREATE_CQ_ATTR_IGNORE_OVERRUN 0
#define HAVE_DECL_IBV_CREATE_QP_EX 1
#define HAVE_DECL_IBV_CREATE_SRQ  1
#define HAVE_DECL_IBV_CREATE_SRQ_EX 1
#define HAVE_DECL_IBV_EVENT_GID_CHANGE 1
#define HAVE_DECL_IBV_EVENT_TYPE_STR 1
#define HAVE_DECL_IBV_EXP_ACCESS_ALLOCATE_MR 1
#define HAVE_DECL_IBV_EXP_ACCESS_ON_DEMAND 1
#define HAVE_DECL_IBV_EXP_ALLOC_DM 1
#define HAVE_DECL_IBV_EXP_ATOMIC_HCA_REPLY_BE 1
#define HAVE_DECL_IBV_EXP_CQ_IGNORE_OVERRUN 1
#define HAVE_DECL_IBV_EXP_CQ_MODERATION 1
#define HAVE_DECL_IBV_EXP_CREATE_QP 1
#define HAVE_DECL_IBV_EXP_CREATE_RES_DOMAIN 1
#define HAVE_DECL_IBV_EXP_CREATE_SRQ 1
#define HAVE_DECL_IBV_EXP_DCT_OOO_RW_DATA_PLACEMENT 1
#define HAVE_DECL_IBV_EXP_DESTROY_RES_DOMAIN 1
#define HAVE_DECL_IBV_EXP_DEVICE_ATTR_PCI_ATOMIC_CAPS 1
#define HAVE_DECL_IBV_EXP_DEVICE_ATTR_RESERVED_2 1
#define HAVE_DECL_IBV_EXP_DEVICE_DC_TRANSPORT 1
#define HAVE_DECL_IBV_EXP_DEVICE_MR_ALLOCATE 1
#define HAVE_DECL_IBV_EXP_MR_FIXED_BUFFER_SIZE 1
#define HAVE_DECL_IBV_EXP_MR_INDIRECT_KLMS 1
#define HAVE_DECL_IBV_EXP_ODP_SUPPORT_IMPLICIT 1
#define HAVE_DECL_IBV_EXP_POST_SEND 1
#define HAVE_DECL_IBV_EXP_PREFETCH_MR 1
#define HAVE_DECL_IBV_EXP_PREFETCH_WRITE_ACCESS 1
#define HAVE_DECL_IBV_EXP_QPT_DC_INI 1
#define HAVE_DECL_IBV_EXP_QP_CREATE_UMR 1
#define HAVE_DECL_IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG 1
#define HAVE_DECL_IBV_EXP_QP_INIT_ATTR_RES_DOMAIN 1
#define HAVE_DECL_IBV_EXP_QP_OOO_RW_DATA_PLACEMENT 1
#define HAVE_DECL_IBV_EXP_QUERY_DEVICE 1
#define HAVE_DECL_IBV_EXP_QUERY_GID_ATTR 1
#define HAVE_DECL_IBV_EXP_REG_MR  1
#define HAVE_DECL_IBV_EXP_RES_DOMAIN_THREAD_MODEL 1
#define HAVE_DECL_IBV_EXP_SEND_EXT_ATOMIC_INLINE 1
#define HAVE_DECL_IBV_EXP_SETENV  1
#define HAVE_DECL_IBV_EXP_WR_EXT_MASKED_ATOMIC_CMP_AND_SWP 1
#define HAVE_DECL_IBV_EXP_WR_EXT_MASKED_ATOMIC_FETCH_AND_ADD 1
#define HAVE_DECL_IBV_EXP_WR_NOP  1
#define HAVE_DECL_IBV_GET_ASYNC_EVENT 1
#define HAVE_DECL_IBV_GET_DEVICE_NAME 1
#define HAVE_DECL_IBV_LINK_LAYER_ETHERNET 1
#define HAVE_DECL_IBV_LINK_LAYER_INFINIBAND 1
#define HAVE_DECL_IBV_MLX5_EXP_GET_CQ_INFO 1
#define HAVE_DECL_IBV_MLX5_EXP_GET_QP_INFO 1
#define HAVE_DECL_IBV_MLX5_EXP_GET_SRQ_INFO 1
#define HAVE_DECL_IBV_MLX5_EXP_UPDATE_CQ_CI 1
#define HAVE_DECL_IBV_ODP_SUPPORT_IMPLICIT 0
#define HAVE_DECL_IBV_QPF_GRH_REQUIRED 0
#define HAVE_DECL_IBV_QUERY_DEVICE_EX 1
#define HAVE_DECL_IBV_QUERY_GID   1
#define HAVE_DECL_IBV_WC_STATUS_STR 1
#define HAVE_DECL_IPPROTO_TCP     1
#define HAVE_DECL_MADV_FREE       1
#define HAVE_DECL_MADV_REMOVE     1
#define HAVE_DECL_MLX5_WQE_CTRL_SOLICITED 1
#define HAVE_DECL_POSIX_MADV_DONTNEED 1
#define HAVE_DECL_PR_SET_PTRACER  1
#define HAVE_DECL_RDMA_ESTABLISH  1
#define HAVE_DECL_RDMA_INIT_QP_ATTR 1
#define HAVE_DECL_SOL_SOCKET      1
#define HAVE_DECL_SO_KEEPALIVE    1
#define HAVE_DECL_SPEED_UNKNOWN   1
#define HAVE_DECL_STRERROR_R      1
#define HAVE_DECL_SYS_BRK         1
#define HAVE_DECL_SYS_IPC         0
#define HAVE_DECL_SYS_MADVISE     1
#define HAVE_DECL_SYS_MMAP        1
#define HAVE_DECL_SYS_MREMAP      1
#define HAVE_DECL_SYS_MUNMAP      1
#define HAVE_DECL_SYS_SHMAT       1
#define HAVE_DECL_SYS_SHMDT       1
#define HAVE_DECL_TCP_KEEPCNT     1
#define HAVE_DECL_TCP_KEEPIDLE    1
#define HAVE_DECL_TCP_KEEPINTVL   1
#define HAVE_DECL___PPC_GET_TIMEBASE_FREQ 0
#define HAVE_DLFCN_H              1
#define HAVE_EXP_UMR              1
#define HAVE_EXP_UMR_KSM          1
#define HAVE_HW_TIMER             1
#define HAVE_IB                   1
#define HAVE_IBV_DM               1
#define HAVE_IBV_EXP_DM           1
#define HAVE_IBV_EXP_QP_CREATE_UMR 1
#define HAVE_IBV_EXP_RES_DOMAIN   1
#define HAVE_IB_EXT_ATOMICS       1
#define HAVE_IN6_ADDR_S6_ADDR32   1
#define HAVE_INFINIBAND_MLX5_HW_H 1
#define HAVE_INTTYPES_H           1
#define HAVE_IP_IP_DST            1
#define HAVE_LIBGEN_H             1
#define HAVE_LIBRT                1
#define HAVE_LINUX_FUTEX_H        1
#define HAVE_LINUX_IP_H           1
#define HAVE_LINUX_MMAN_H         1
#define HAVE_MALLOC_H             1
#define HAVE_MALLOC_HOOK          1
#define HAVE_MALLOC_TRIM          1
#define HAVE_MASKED_ATOMICS_ENDIANNESS 1
#define HAVE_MEMALIGN             1
#define HAVE_MEMORY_H             1
#define HAVE_MLX5_HW              1
#define HAVE_MLX5_HW_UD           1
#define HAVE_MREMAP               1
#define HAVE_NETINET_IP_H         1
#define HAVE_NET_ETHERNET_H       1
#define HAVE_NUMA                 1
#define HAVE_NUMAIF_H             1
#define HAVE_NUMA_H               1
#define HAVE_ODP                  1
#define HAVE_ODP_IMPLICIT         1
#define HAVE_POSIX_MEMALIGN       1
#define HAVE_PREFETCH             1
#define HAVE_PROFILING            1
#define HAVE_RDMACM_QP_LESS       1
#define HAVE_SCHED_GETAFFINITY    1
#define HAVE_SCHED_SETAFFINITY    1
#define HAVE_SIGACTION_SA_RESTORER 1
#define HAVE_SIGEVENT_SIGEV_UN_TID 1
#define HAVE_SIGHANDLER_T         1
#define HAVE_STDINT_H             1
#define HAVE_STDLIB_H             1
#define HAVE_STRERROR_R           1
#define HAVE_STRINGS_H            1
#define HAVE_STRING_H             1
#define HAVE_STRUCT_BITMASK       1
#define HAVE_STRUCT_IBV_ASYNC_EVENT_ELEMENT_DCT 1
#define HAVE_STRUCT_IBV_EXP_CREATE_SRQ_ATTR_DC_OFFLOAD_PARAMS 1
#define HAVE_STRUCT_IBV_EXP_DEVICE_ATTR_EXP_DEVICE_CAP_FLAGS 1
#define HAVE_STRUCT_IBV_EXP_DEVICE_ATTR_ODP_CAPS 1
#define HAVE_STRUCT_IBV_EXP_DEVICE_ATTR_ODP_CAPS_PER_TRANSPORT_CAPS_DC_ODP_CAPS 1
#define HAVE_STRUCT_IBV_EXP_DEVICE_ATTR_ODP_MR_MAX_SIZE 1
#define HAVE_STRUCT_IBV_EXP_QP_INIT_ATTR_MAX_INL_RECV 1
#define HAVE_STRUCT_IBV_MLX5_QP_INFO_BF_NEED_LOCK 1
#define HAVE_STRUCT_MLX5_AH_IBV_AH 1
#define HAVE_STRUCT_MLX5_CQE64_IB_STRIDE_INDEX 1
#define HAVE_STRUCT_MLX5_GRH_AV_RMAC 1
#define HAVE_STRUCT_MLX5_SRQ_CMD_QP 1
#define HAVE_STRUCT_MLX5_WQE_AV_BASE 1
#define HAVE_SYS_EPOLL_H          1
#define HAVE_SYS_EVENTFD_H        1
#define HAVE_SYS_STAT_H           1
#define HAVE_SYS_TYPES_H          1
#define HAVE_SYS_UIO_H            1
#define HAVE_TL_DC                1
#define HAVE_TL_RC                1
#define HAVE_TL_UD                1
#define HAVE_UCM_PTMALLOC286      1
#define HAVE_UNISTD_H             1
#define HAVE_VERBS_EXP_H          1
#define HAVE___CLEAR_CACHE        1
#define HAVE___CURBRK             1
#define HAVE___SIGHANDLER_T       1
#define IBV_HW_TM                 1
#define LT_OBJDIR                 ".libs/"
#define NVALGRIND                 1
#define PACKAGE                   "ucx"
#define PACKAGE_BUGREPORT         ""
#define PACKAGE_NAME              "ucx"
#define PACKAGE_STRING            "ucx 1.10"
#define PACKAGE_TARNAME           "ucx"
#define PACKAGE_URL               ""
#define PACKAGE_VERSION           "1.10"
#define STDC_HEADERS              1
#define STRERROR_R_CHAR_P         1
#define UCM_BISTRO_HOOKS          1
#define UCS_MAX_LOG_LEVEL         UCS_LOG_LEVEL_INFO
#define UCT_TCP_EP_KEEPALIVE      1
#define UCT_UD_EP_DEBUG_HOOKS     0
#define UCX_CONFIGURE_FLAGS       "--prefix=/zhome/academic/HLRS/hlrs/hpcjschu/opt-hawk/ucx-1.10.0 --enable-profiling --enable-stats --enable-mt --disable-backtrace-detail --without-mpi --enable-mt --enable-cma --without-cuda --without-gdrcopy --with-verbs --with-rdmacm --disable-logging CC=gcc CXX=g++"
#define UCX_MODULE_SUBDIR         "ucx"
#define VERSION                   "1.10"
#define restrict                  __restrict
#define test_MODULES              ":module"
#define ucm_MODULES               ""
#define uct_MODULES               ":ib:rdmacm:cma"
#define uct_cuda_MODULES          ""
#define uct_ib_MODULES            ""
#define uct_rocm_MODULES          ""
#define ucx_perftest_MODULES      ""

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions