This repo demonstrates how to use pybind11 to provide python interface for highly optimized CUDA code. It only contains basic functionality. Present work uses modern CMake/Cuda and Yujia Zhai's GEMV implementention approach. CmakeLists comes from pkestene
This project requires CMake>=3.18, it can be built with code below:
git clone --recurse-submodules https://github.com/dongdongban/gemm-pybind-learning
cmake -S . -B build -DCMAKE_CUDA_ARCHITECTURES="75" && cd build
cmake --build .
The device architecture "sm_75" should be replaced by your native GPU capability.
if Nothing went wrong, check your module with these codes:
cd Optimizing-SGEMV-on-NVIDIA-GPUs
python -c 'import mygemm; mygemm.host(4096, 4096, 1); mygemm.host(4096, 4096, 2)'