Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge in Jennifer's changes #2

Open
wants to merge 20 commits into
base: hpgmp-dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
0a8bc8b
initial Cuda-enable version
iyamazaki Feb 26, 2022
0059697
Fix issue with Makefile.ext and config.
jennloe Feb 28, 2022
51a082d
Merge remote-tracking branch 'upstream/hpgmp-dev' into hpgmp-dev
jennloe Feb 28, 2022
86d646a
Merge branch 'hpgmp-cuda' into hpgmp-dev
jennloe Feb 28, 2022
9fdcf4c
add flops to GMRES
iyamazaki Mar 1, 2022
2fd22a4
use SpMV for restriction & prologation
iyamazaki Mar 1, 2022
2c0b12d
Merge branch 'hpgmp-dev' of https://github.com/jennloe/hpcg into hpgm…
iyamazaki Mar 1, 2022
2cb4611
fixes after merge
iyamazaki Mar 2, 2022
7972830
"compact" version of GS with Cuda, plus some cleanups (e.g., memory f…
iyamazaki Mar 2, 2022
f278623
reference implementations for GEMV & GEMVT
iyamazaki Mar 2, 2022
49c3149
CGS2 for GMRES-IR
iyamazaki Mar 2, 2022
925d8b8
use cudaMemset, instead of cublasXscal, for ZeroVector (since scaling…
iyamazaki Mar 2, 2022
1fa84e1
With Cuda, just copy back non-local part of vector to device after Ha…
iyamazaki Mar 3, 2022
9a8909e
add Gflop/s for IR, and fixing flops for MG
iyamazaki Mar 7, 2022
520e6d4
Renamed file with parameter struct. Cleaned up code and comments. Rem…
jennloe Mar 7, 2022
8277249
Fix build errors.
jennloe Mar 7, 2022
43c726d
Working on the convergence verification prob. Likely doesn't build r…
jennloe Mar 8, 2022
b6de5aa
just trying to call rocBLAS
iyamazaki Mar 8, 2022
b58e719
fix CUDA build
iyamazaki Mar 8, 2022
d7beb92
Merge remote-tracking branch 'upstream/hpgmp-cuda' into hpgmp-dev
jennloe Mar 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,19 @@ HPCG_DEPS = src/ComputeResidual.o \
src/CheckProblem.o \
src/OptimizeProblem.o src/ReadHpcgDat.o src/ReportResults.o \
src/SetupHalo.o src/SetupHalo_ref.o src/TestSymmetry.o src/TestNorms.o src/WriteProblem.o \
src/YAML_Doc.o src/YAML_Element.o src/ComputeDotProduct.o \
src/ComputeDotProduct_ref.o src/finalize.o src/init.o src/mytimer.o src/ComputeSPMV.o \
src/ComputeSPMV_ref.o src/ComputeWAXPBY.o src/ComputeWAXPBY_ref.o \
src/ComputeMG_ref.o src/ComputeMG.o src/ComputeProlongation_ref.o src/ComputeRestriction_ref.o \
src/YAML_Doc.o src/YAML_Element.o \
src/ComputeDotProduct.o src/ComputeDotProduct_ref.o \
src/finalize.o src/init.o src/mytimer.o \
src/ComputeSPMV.o src/ComputeSPMV_ref.o \
src/ComputeSYMGS.o src/ComputeSYMGS_ref.o \
src/ComputeWAXPBY.o src/ComputeWAXPBY_ref.o \
src/ComputeMG_ref.o src/ComputeMG.o \
src/ComputeProlongation_ref.o src/ComputeRestriction_ref.o \
src/ComputeOptimalShapeXYZ.o src/MixedBaseCounter.o src/CheckAspectRatio.o src/OutputFile.o \
\
src/TestGMRES.o src/ComputeTRSM.o src/ComputeGEMV.o \
src/TestGMRES.o src/ComputeTRSM.o src/ComputeGEMV.o src/ComputeGEMVT.o \
src/ComputeGEMV.o src/ComputeGEMV_ref.o \
src/ComputeGEMVT.o src/ComputeGEMVT_ref.o \
src/GMRES.o src/GMRES_IR.o \
src/ComputeGS_Forward.o src/ComputeGS_Forward_ref.o \
src/SetupProblem.o \
Expand All @@ -27,9 +33,13 @@ HPCG_DEPS = src/ComputeResidual.o \
bin/xhpgmp: src/main_hpgmp.o $(HPCG_DEPS)
$(LINKER) $(LINKFLAGS) src/main_hpgmp.o $(HPCG_DEPS) -o bin/xhpgmp $(HPCG_LIBS)

bin/xhpgmp_time: src/main_time.o $(HPCG_DEPS)
$(LINKER) $(LINKFLAGS) src/main_time.o $(HPCG_DEPS) -o bin/xhpgmp_time $(HPCG_LIBS)

clean:
rm -f $(HPCG_DEPS) \
bin/xhpgmp src/main_hpgmp.o
bin/xhpgmp src/main_hpgmp.o \
bin/xhpgmp_time src/main_time.o

.PHONY: clean

33 changes: 28 additions & 5 deletions Makefile.ext
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ HPCG_DEPS = src/ComputeResidual.o \
src/ComputeOptimalShapeXYZ.o \
src/ComputeSPMV.o \
src/ComputeSPMV_ref.o \
src/ComputeSYMGS.o \
src/ComputeSYMGS_ref.o \
src/ComputeWAXPBY.o \
src/ComputeWAXPBY_ref.o \
src/ComputeMG_ref.o \
Expand All @@ -45,28 +47,37 @@ HPCG_DEPS = src/ComputeResidual.o \
src/ComputeGS_Forward_ref.o \
src/ComputeTRSM.o \
src/ComputeGEMV.o \
src/SetupProblem.o \
src/ComputeGEMV_ref.o \
src/ComputeGEMVT.o \
src/ComputeGEMVT_ref.o \
src/SetupProblem.o \
src/GenerateNonsymProblem.o \
src/GenerateNonsymProblem_v1_ref.o \
src/GenerateNonsymCoarseProblem.o \

# These header files are included in many source files, so we recompile every file if one or more of these header is modified.
PRIMARY_HEADERS = HPCG_SRC_PATH/src/Geometry.hpp HPCG_SRC_PATH/src/SparseMatrix.hpp HPCG_SRC_PATH/src/Vector.hpp HPCG_SRC_PATH/src/CGData.hpp \
HPCG_SRC_PATH/src/MGData.hpp HPCG_SRC_PATH/src/hpcg.hpp
PRIMARY_HEADERS = HPCG_SRC_PATH/src/Geometry.hpp HPCG_SRC_PATH/src/SparseMatrix.hpp HPCG_SRC_PATH/src/Vector.hpp HPCG_SRC_PATH/src/MultiVector.hpp \
HPCG_SRC_PATH/src/CGData.hpp HPCG_SRC_PATH/src/MGData.hpp HPCG_SRC_PATH/src/Hpgmp_Params.hpp

all: bin/xhpgmp
all: bin/xhpgmp bin/xhpgmp_time

bin/xhpgmp: src/main_hpgmp.o $(HPCG_DEPS)
$(LINKER) $(LINKFLAGS) src/main_hpgmp.o $(HPCG_DEPS) $(HPCG_LIBS) -o bin/xhpgmp

bin/xhpgmp_time: src/main_time.o $(HPCG_DEPS)
$(LINKER) $(LINKFLAGS) src/main_time.o $(HPCG_DEPS) $(HPCG_LIBS) -o bin/xhpgmp_time

clean:
rm -f src/*.o bin/xhpgmp
rm -f src/*.o bin/xhpgmp bin/xhpgmp_time

.PHONY: all clean

src/main_hpgmp.o: HPCG_SRC_PATH/src/main_hpgmp.cpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

src/main_time.o: HPCG_SRC_PATH/src/main_time.cpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

src/ComputeResidual.o: HPCG_SRC_PATH/src/ComputeResidual.cpp HPCG_SRC_PATH/src/ComputeResidual.hpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

Expand Down Expand Up @@ -139,6 +150,9 @@ src/ComputeSPMV_ref.o: HPCG_SRC_PATH/src/ComputeSPMV_ref.cpp HPCG_SRC_PATH/src/C
src/ComputeSYMGS.o: HPCG_SRC_PATH/src/ComputeSYMGS.cpp HPCG_SRC_PATH/src/ComputeSYMGS.hpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

src/ComputeSYMGS_ref.o: HPCG_SRC_PATH/src/ComputeSYMGS_ref.cpp HPCG_SRC_PATH/src/ComputeSYMGS_ref.hpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

src/ComputeWAXPBY.o: HPCG_SRC_PATH/src/ComputeWAXPBY.cpp HPCG_SRC_PATH/src/ComputeWAXPBY.hpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

Expand Down Expand Up @@ -185,6 +199,15 @@ src/ComputeTRSM.o: HPCG_SRC_PATH/src/ComputeTRSM.cpp HPCG_SRC_PATH/src/ComputeTR
src/ComputeGEMV.o: HPCG_SRC_PATH/src/ComputeGEMV.cpp HPCG_SRC_PATH/src/ComputeGEMV.hpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

src/ComputeGEMV_ref.o: HPCG_SRC_PATH/src/ComputeGEMV_ref.cpp HPCG_SRC_PATH/src/ComputeGEMV_ref.hpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

src/ComputeGEMVT.o: HPCG_SRC_PATH/src/ComputeGEMVT.cpp HPCG_SRC_PATH/src/ComputeGEMVT.hpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

src/ComputeGEMVT_ref.o: HPCG_SRC_PATH/src/ComputeGEMVT_ref.cpp HPCG_SRC_PATH/src/ComputeGEMVT_ref.hpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

src/SetupProblem.o: HPCG_SRC_PATH/src/SetupProblem.cpp HPCG_SRC_PATH/src/SetupProblem.hpp $(PRIMARY_HEADERS)
$(CXX) -c $(CXXFLAGS) -IHPCG_SRC_PATH/src $< -o $@

Expand Down
5 changes: 5 additions & 0 deletions QUICKSTART
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ NOTE: The instructions in this file assume you are working with a version
export OMP_NUM_THREADS 4
mpiexec -np 64 ./xhpcg

5) To set parameters, hpcg.dat ... First two lines ignored.
Line 3: nx ny nz
Line 4: Time to run the bechmark (seconds)
Line 5:

5) The benchmark has completed execution. This should take a few minutes
when running in evaluation mode, and take about 30 minutes in official
benchmark mode. If you are running on a production system, you may be able
Expand Down
112 changes: 112 additions & 0 deletions bin/HPGMP-Benchmark_1.1_2022-03-07_16-04-57.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
HPGMP-Benchmark
version=1.1
Release date=March 28, 2019
Machine Summary=
Machine Summary::Distributed Processes=1
Machine Summary::Threads per processes=1
Global Problem Dimensions=
Global Problem Dimensions::Global nx=16
Global Problem Dimensions::Global ny=16
Global Problem Dimensions::Global nz=16
Processor Dimensions=
Processor Dimensions::npx=1
Processor Dimensions::npy=1
Processor Dimensions::npz=1
Local Domain Dimensions=
Local Domain Dimensions::nx=16
Local Domain Dimensions::ny=16
Local Domain Dimensions::Lower ipz=0
Local Domain Dimensions::Upper ipz=0
Local Domain Dimensions::nz=16
########## Problem Summary ##########=
Setup Information=
Setup Information::Setup Time=0.002537
Linear System Information=
Linear System Information::Number of Equations=4096
Linear System Information::Number of Nonzero Terms=97336
Multigrid Information=
Multigrid Information::Number of coarse grid levels=3
Multigrid Information::Coarse Grids=
Multigrid Information::Coarse Grids::Grid Level=1
Multigrid Information::Coarse Grids::Number of Equations=512
Multigrid Information::Coarse Grids::Number of Nonzero Terms=10648
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
Multigrid Information::Coarse Grids::Grid Level=2
Multigrid Information::Coarse Grids::Number of Equations=64
Multigrid Information::Coarse Grids::Number of Nonzero Terms=1000
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
Multigrid Information::Coarse Grids::Grid Level=3
Multigrid Information::Coarse Grids::Number of Equations=8
Multigrid Information::Coarse Grids::Number of Nonzero Terms=64
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
########## Memory Use Summary ##########=
Memory Use Information=
Memory Use Information::Total memory used for data (Gbytes)=0.00292882
Memory Use Information::Memory used for OptimizeProblem data (Gbytes)=0
Memory Use Information::Bytes per equation (Total memory / Number of Equations)=715.045
Memory Use Information::Memory used for linear system and CG (Gbytes)=0.00257652
Memory Use Information::Coarse Grids=
Memory Use Information::Coarse Grids::Grid Level=1
Memory Use Information::Coarse Grids::Memory used=0.000308152
Memory Use Information::Coarse Grids::Grid Level=2
Memory Use Information::Coarse Grids::Memory used=3.8904e-05
Memory Use Information::Coarse Grids::Grid Level=3
Memory Use Information::Coarse Grids::Memory used=5.248e-06
########## V&V Testing Summary ##########=
Spectral Convergence Tests=
Spectral Convergence Tests::Result=FAILED
Spectral Convergence Tests::Unpreconditioned=
Spectral Convergence Tests::Unpreconditioned::Maximum iteration count=21
Spectral Convergence Tests::Unpreconditioned::Expected iteration count=12
Spectral Convergence Tests::Preconditioned=
Spectral Convergence Tests::Preconditioned::Maximum iteration count=3
Spectral Convergence Tests::Preconditioned::Expected iteration count=2
########## Iterations Summary ##########=
Iteration Count Information=
Iteration Count Information::Result=PASSED
Iteration Count Information::Reference CG iterations per set=50
Iteration Count Information::Optimized CG iterations per set=500
Iteration Count Information::Total number of reference iterations=50
Iteration Count Information::Total number of optimized iterations=500
########## Reproducibility Summary ##########=
Reproducibility Information=
Reproducibility Information::Result=FAILED
Reproducibility Information::Scaled residual mean=2.122e-314
Reproducibility Information::Scaled residual variance=2.122e-314
########## Performance Summary (times in sec) ##########=
Benchmark Time Summary=
Benchmark Time Summary::Optimization phase=0
Benchmark Time Summary::DDOT=0
Benchmark Time Summary::WAXPBY=0
Benchmark Time Summary::SpMV=0
Benchmark Time Summary::MG=0
Benchmark Time Summary::Total=0
Floating Point Operations Summary=
Floating Point Operations Summary::Raw DDOT=1.22962e+07
Floating Point Operations Summary::Raw WAXPBY=1.22962e+07
Floating Point Operations Summary::Raw SpMV=9.75307e+07
Floating Point Operations Summary::Raw MG=5.45048e+08
Floating Point Operations Summary::Total=6.67171e+08
Floating Point Operations Summary::Total with convergence overhead=6.67171e+07
GB/s Summary=
GB/s Summary::Raw Read B/W=inf
GB/s Summary::Raw Write B/W=inf
GB/s Summary::Raw Total B/W=inf
GB/s Summary::Total with convergence and optimization phase overhead=2002.73
GFLOP/s Summary=
GFLOP/s Summary::Raw DDOT=inf
GFLOP/s Summary::Raw WAXPBY=inf
GFLOP/s Summary::Raw SpMV=inf
GFLOP/s Summary::Raw MG=inf
GFLOP/s Summary::Raw Total=inf
GFLOP/s Summary::Total with convergence overhead=inf
GFLOP/s Summary::Total with convergence and optimization phase overhead=262.976
User Optimization Overheads=
User Optimization Overheads::Optimization phase time (sec)=0
User Optimization Overheads::Optimization phase time vs reference SpMV+MG time=nan
Final Summary=
Final Summary::HPCG result is=INVALID.
Final Summary::Please review the YAML file contents=You may NOT submit these results for consideration.
4 changes: 0 additions & 4 deletions bin/hpcg.dat

This file was deleted.

Loading