You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Earlier, I discussed with @JiakunYan about the issue of an HPX-based application crashing when using parcelport LCI, and we debugged it. Currently, the crash issue has been resolved.
Please refer to this link: #6526
#include<iostream>
#include"Xlog.h"
#include"Iface.h"// #include "hpx.hpp"
#include"hpx/hpx_start.hpp"
#include"hpx/version.hpp"
#include"hpx/init.hpp"
#include"hpx/runtime.hpp"
#include"hpx/include/actions.hpp"
#include"hpx/include/lcos.hpp"
#include"hpx/include/async.hpp"
#include"TestHpx.h"usingnamespacestd;inthpx_main(int argc, char* argv[])
{
TRACE("hpx_main function");
hpx::error_code ec = hpx::make_success_code();
std::vector<hpx::id_type> localities = hpx::find_all_localities(ec);
if (hpx::error::success != ec.value())
{
ERROR("find_all_localities executed failed, %s", ec.get_message().c_str());
return -1;
}
if (localities.size() < 2)
{
ERROR("this program requires at least two localities");
return -2;
}
INFO("num of localities: %ld", localities.size());
for (constauto& loc : localities)
{
hpx::naming::gid_type gid = loc.get_gid();
std::string address = hpx::get_locality_name(loc).get();
std::uint32_t localityId = hpx::naming::get_locality_id_from_gid(gid);
DEBUG("locality id: %d", localityId);
DEBUG("locality name: %s, id: %08X", address.c_str(), localityId);
}
getchar();
returnhpx::finalize();
}
intmain(int argc, char **argv)
{
auto ret = xlogInitFile("conf/LogConf.yaml");
if (false == ret)
{
cerr << "logger init failed." << endl;
return -1;
}
INFO("hpx demostration running...");
auto hpxMajor = hpx::major_version();
auto hpxMinor = hpx::minor_version();
auto hpxPatch = hpx::subminor_version();
INFO("hpx version: %d-%d-%d", hpxMajor, hpxMinor, hpxPatch);
auto hpxCfg = vector<string>();
hpxCfg.push_back("hpx.handle_signals=0");
hpxCfg.push_back("hpx.max_idle_loop_count=1000");
hpxCfg.push_back("hpx.max_idle_backoff_time=1000");
hpx::init_params initArgs;
initArgs.cfg = std::move(hpxCfg);
ret = hpx::start(argc, argv, initArgs);
if (false == ret)
{
ERROR("hpx runtime init failed");
return -3;
}
getchar();
WARN("hpx demostration exiting...");
returnhpx::stop();
}
When I start my HPX program with different command parameters, I encounter various issues such as errors, crashes, and the program not entering the hpx_main function, depending on the command used.
srun --mpi=pmi2 --nodelist=DellNode0,AsusNode1 --ntasks=2 --ntasks-per-node=1 -p MyTestRxe HpxDemo_d.elf
2024-08-27 10:01:23.088 - INFO - hpx demostration running...
2024-08-27 10:01:23.088 - INFO - hpx version: 1-10-0
2024-08-27 10:01:23.091 - INFO - hpx demostration running...
2024-08-27 10:01:23.091 - INFO - hpx version: 1-10-0
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[DellNode0:450384] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
srun: error: DellNode0: task 0: Exited with exit code 1
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[AsusNode1:441818] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
srun: error: AsusNode1: task 1: Exited with exit code 1
@phil-skillwon My previous understanding is that you also get the same behavior with only one task. However, all examples you posted here are with 2 tasks. Could you confirm whether your program runs successfully with one task?
Earlier, I discussed with @JiakunYan about the issue of an HPX-based application crashing when using parcelport LCI, and we debugged it. Currently, the crash issue has been resolved.
Please refer to this link:
#6526
Available PMI with Slurm:
Here is my test code:
When I start my HPX program with different command parameters, I encounter various issues such as errors, crashes, and the program not entering the hpx_main function, depending on the command used.
Segmentation Fault with:
Debugging Information:
Program not entering hpx_main with:
Debugging Information:
Segmentation Fault with:
Debugging Information:
Failed without Segmentation Fault with:
Debugging Information:
GDB Debugging with pmix:
Debugging Information:
Could these issues be related to the HPX runtime?
Could someone please help me?
The text was updated successfully, but these errors were encountered: