KTT is a C++ tuning library for OpenCL and CUDA kernels. Project is currently in late beta stage with all of the baseline functionality available.
- Ability to define kernel tuning parameters like thread count, vector data types and loop unroll factors in order to optimize computation for particular device
- Support for iterative kernel launches and composite kernels
- Ability to automatically ensure correctness of tuned computation with reference kernel or C++ function
- Support for 2 distinct modes - find the best kernel configuration for device in tuning mode, then launch the optimized kernel repeatedly for different inputs in computation mode with very low overhead
- Support for multiple compute APIs in a single library, switching between CUDA and OpenCL requires only minor changes in C++ code (eg. changing the kernel source file), no library recompilation is needed
- Large number of customization options, including an ability to specify custom tolerance threshold for floating-point argument validation, an ability to change kernel compiler flags and more
- No direct usage of vendor specific SDKs is needed, only corresponding device drivers have to be installed
- Documentation for KTT API can be found here.
- Newest version of KTT library can be found here.
- Prebuilt binaries are currently available only for some platforms. Other platforms require manual build.
- Prebuilt binaries for Nvidia include both CUDA and OpenCL support, binaries for AMD and Intel include only OpenCL support.
Examples showcasing KTT functionality are located inside examples folder. List of currently available examples:
compute_api_info (OpenCL / CUDA)
: basic example showing how to retrieve detailed information about compute API platforms and devices through KTT APIsimple (OpenCL / CUDA)
: basic example showing how to run simple kernel with KTT framework, utilizes reference class, no actual autotuning is donecoulomb_sum_2d (OpenCL)
: advanced example which utilizes large number of tuning parameters, thread modifiers and constraintscoulomb_sum_3d_iterative (OpenCL)
: 3D version of previous example, utilizes tuning manipulator to iteratively launch 2D kernelcoulomb_sum_3d (OpenCL)
: alternative to iterative version, utilizes several tuning parameters and reference kernelnbody (OpenCL)
: advanced example which utilizes tuning parameters, multiple constraints and validation of multiple arguments with reference kernelreduction (OpenCL)
: advanced example which utilizes reference class, tuning manipulator and several tuning parameters
-
KTT can be built as a dynamic (shared) library using command line build tool Premake. Currently supported operating systems are Linux and Windows.
-
The prerequisites to build KTT are:
- C++14 compiler, for example Clang 3.4, GCC 5.0, MSVC 19.0 (Visual Studio 2015) or newer
- OpenCL or CUDA library, supported SDKs are AMD APP SDK 3.0, Intel SDK for OpenCL and NVIDIA CUDA Toolkit 7.5 or newer
- Premake 5 (alpha 12 or newer)
-
Build under Linux (inside KTT root folder):
- ensure that path to vendor SDK is correctly set in the environment variables
- run
./premake5 gmake
to generate makefile - run
cd build
to get inside build directory - afterwards run
make config={configuration}_{architecture}
to build the project (eg.make config=release_x86_64
)
-
Build under Windows (inside KTT root folder):
- ensure that path to vendor SDK is correctly set in the environment variables, this should be done automatically during SDK installation
- run
premake5.exe vs2015
(orpremake5.exe vs2017
) to generate Visual Studio project files - open generated solution file and build the project inside Visual Studio
-
Following build options are available:
--outdir=path
specifies custom build directory, default build directory isbuild
--platform=vendor
specifies SDK used for building KTT, useful when multiple SDKs are installed--no-examples
disables compilation of examples--tests
enables compilation of unit tests--no-cuda
disables inclusion of CUDA API during compilation, only affects Nvidia platform--vulkan
enables inclusion of Vulkan API during compilation, note that Vulkan is not fully supported yet
KTT is based on CLTune project. Some parts of KTT API are similar to CLTune API, however internal structure was almost completely rewritten from scratch. Portions of code for following features were ported from CLTune:
- PSO and annealing searcher
- Generation of kernel configurations
- Tuning parameter constraints