-
Notifications
You must be signed in to change notification settings - Fork 53
Verbs Build Instructions
Recent releases of libfabric include a verbs provider that can be used in conjunction with the RXM utility provider to run SOS on Verbs networks (e.g. InfiniBand or RoCE Ethernet).
Please use the ofi_rxm and verbs providers of OFI libfabric.
Libfabric may be setup to use the ofi_rxm and verbs providers via the following configuration. SOS verbs support requires libfabric v1.9.0 or newer.
$ ./configure --prefix=<libfabric_install_dir> --enable-verbs --enable-rxm --disable-psm2 --disable-sockets --disable-usnic --disable-udp --disable-rxd
where <libfabric_install_dir>
is an appropriate installation path (the --disable-*
flags are optional, but may help to assure SOS chooses the ofi_rxm and verbs providers).
Alternatively, SOS can be instructed to use the provider by setting the following environment variable:
SHMEM_OFI_PROVIDER="verbs;ofi_rxm"
Use the following configure options to build SOS to use the libfabric ofi_rxm and verbs providers:
$ ./autogen.sh
$ ./configure --prefix=<SOS_install_dir> --with-ofi=<libfabric_install_dir> --enable-pmi-simple --enable-hard-polling --enable-ofi-mr=basic
$ make
$ make install
where <SOS_install_dir>
is an appropriate installation path for SOS and <libfabric_install_dir>
is the libfabric installation path with ofi_rxm and verbs providers enabled. The hard polling must be enabled through the --enable-hard-polling
flag. Also, the basic memory registration for OFI should be enabled via --enable-ofi-mr=basic
flag.
The SOS build should be added to your path. If you have used the build instructions posted here, this is done by running the following command:
$ export PATH=<SOS_install_dir>/bin:$PATH
Once SOS is in your path, you can use the compiler wrapper oshcc
to compile your application and the launcher wrapper oshrun
to run it. Please assure you have a compatible launcher, such as Hydra 3.2 or newer.
Some versions of libfabric are known to have an issue in the memory registration cache when running multithreaded SHMEM applications. As a workaround, the following environment variable can be set:
FI_MR_CACHE_MAX_COUNT = 0
Please refer to the Troubleshooting wiki page if you encounter any issues. If your particular problem is not covered in the wiki, please submit an issue through Github.