The code and data in this repository were used for the paper "Input-Dependent Power Usage in GPUs" published in SC'24 Sustainable Supercomputing workshop.
-
Install Miniconda and make a python environment (e.g.,
conda create -n gemm python
). -
Activate the conda environment (e.g.,
conda activate gemm
) and install necessary packages (e.g.,pip install cmake
andpip install numpy
). -
Install Cuda Toolkit 12.2+ and set environment variable
CUDACXX
to point to thecuda-x.y/bin/nvcc
directory (or update path). -
Install Nvidia DCGM and enable the DCGM systemd service.
-
These experiments stem from the Cutlass examples. Clone the Cutlass repo.
-
Move or copy
matrices.h
to thecutlass/include/cutlass/gemm/
directory. -
Move or copy
basic_gemm_expt.cu
andother_expt.cu
to thecutlass/examples/00_basic_gemm/
directory. -
Add the following to the
cutlass/examples/00_basic_gemm/CMakeLists.txt
file:cutlass_example_add_executable( 00_basic_gemm_expt basic_gemm_expt.cu ) cutlass_example_add_executable( other_expt other_expt.cu )
-
You might need to comment out the
swap
method incutlass/include/cutlass/fast_math.h
due to overrides.
Add a build folder in the main cutlass/
directory.
mkdir build & cd build
Standard compile for A100. Change the arch code to compile for other platforms, or see Cutlass documentation for more options. The arch code for A100 is 80, H100 is 90a, and RTX 6000 is 75 (this is a useful website on NVIDIA arch codes).
cmake .. -DCUTLASS_NVCC_ARCHS=80
If you want to use tensor cores, compile with:
cmake .. -DCUTLASS_NVCC_ARCHS=80 -DCUTLASS_F16C_ENABLED=ON -DCUTLASS_LIBRARY_KERNELS=tensorop
Navigate to the build and make.
cd examples/00_basic_gemm && make
To change the datatype used in the experiments, update the DTYPE
field at the top of the matrices.h
file. Note that all RVs are initially generated as floats and then converted to DTYPE
with the Cutlass NumericConverter. The default float rounding style is round to nearest.
The experiment script is 00_basic_gemm_expt
, located in cutlass/build/examples/00_basic_gemm
. It has the following usage:
./00_basic_gemm_expt <M> <N> <K> <alpha> <beta> <iterations> <seed> <pattern> <output path> <pattern parameters>
Example parameters for each pattern can be seen in paper_expts.py
. The paper experiments are also in paper_expts.py
.
To run experiment 13 (no transpose on B), set the transpose parameter of the B matrix to be false
in the TestCutlassGemm
function of basic_gemm_expt.cu
.
To run experiments with tensor cores enabled, uncomment the using MMAOp
and using SmArch
statements in the CutlassSgemmNN
function of basic_gemm_expt.cu
. Add the following parameters to the subsequent using CutlassGemm
statement: DTYPE, MMAOp, SmArch
.