-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Build]How to use OpenMPI and multi GPU parallel calling AMGX? #324
Comments
How can I call AMGX's AMGX_matrix_upload_all_global function to solve a five-diagonal banded sparse matrix in parallel on multiple GPUs? How to handle halos between different GPUs during parallelism? How is the matrix constructed in amgx_mpi_poisson5pt.c? Why isn't it a five-diagonal banded sparse matrix? |
Hi @ZiwuZheng , sorry for delayed reply. Let me answer your second post first.
We don't focus on solving stencil cases alone, but rather having a solver capable of addressing unstructured cases too. Matrix is same 5pt stencil, but represented in CSR format. In order to solver your diagonal matrix you can convert it to CSR first and then pass to AMGX.
AMGX will handle this for you. There are few functions for uploading your matrix data to the solver. For example, you can find information on the |
If you already have your halo information set up, you might want to look at the upload functions |
Hi, marsaev. Thank you very much for your reply! I am currently solving a two-dimensional Poisson equation with Dirichlet and Neumann boundary conditions, which will result in a five diagonal strip sparse matrix. I have currently converted the matrix to CSR format and uploaded the CSR matrix using the AMGX_matrix_upload_all_global function. And the solution was validated on a single GPU, but there were errors when solving on multiple GPUs (as mentioned in the first issue above). After modifying the halo information of CSR matrix, the code can run, but the results are incorrect in adjacent locations on different GPUs (as shown in the figure below, which shows the results of 4 GPUs running in parallel). How can I modify the code to obtain the correct result? |
Is there a chance to obtain tiny version of your problem to visually inspect how data is handled?
Could be possible to make problem, let's say, 16x16 and try to solve it on 2 GPUs? That way we could take a look on what data is passed to AMGX and if it aligns with API expectations. |
Describe the issue
I can obtain the correct results when using a single GPU to call AMGX to solve a system of linear equations (Poisson's equations), but when using openmpi and multi GPU parallel calls to AMGX, an error occurs. This error may be caused by an error in uploading the matrix when calling AMGX_matrix_upload_all_global, or by parameter errors when creating configuration files or solvers. How should I give the relevant parameters of AMGX_matrix_upload_all_global when calling AMGX with multiple GPUs? Especially how is the ghost grid set up? What are the requirements for matrices when setting up ghost grids? What are the precautions when establishing configuration files, solvers, parallel environments, etc?
A clear and concise description of what the issue is.
Environment information:
Ubuntu 20.04
]mpicxx
]3.23
]CUDA12.2
]OpenMPI 4.0.3
]v2.3.0
]Compilation information
mpicxx -cuda -gpu=ccall,cuda12.2 CSR_3Dplan_global.cu -L /home/zcy/software/AMGX-main-nvhpc/lib -lamgxsh -I /home/zcy/software/AMGX-main-nvhpc/include/ -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/comm_libs/mpi/lib/ -lmpi -lmpi_cxx -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/math_libs/12.2/lib64 -lcufft -I /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/math_libs/12.2/include
Issue information
at: /home/stu1/software/AMGX-main-nvhpc/src/distributed/comms_visitors3.cu:23
Stack trace:
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : ()+0x21eb697
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::ExcHalo2AsyncFunctor<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2>, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >::operator()(amgx::CommsMPIHostBufferStream<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&)+0
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : void amgx::CommsMPIHostBufferStream<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::do_exchange_halo<amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > const&, int)+0x206
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : void amgx::multiply<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >(amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::ViewType)+0xf45
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : void amgx::axmb<amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, int, int)+0x58
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::FGMRES_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_iteration(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x9e8
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x594
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_no_throw(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGXCaught amgx exception: Vector size too small: not enough space for halo elements.
Vector: {tag = 1, size = 288}
Required size: 304
Looking forward to your answer! Best wishes for you!
The text was updated successfully, but these errors were encountered: