Skip to content

Commit

Permalink
rename nvprof-connector to nvtx-connector, add 3 more hooks (#139)
Browse files Browse the repository at this point in the history
Rename NVPROF connector to NVTX Connector

Make the default behavior of the conenctor to fence, and use environment variable to control fencing.
  • Loading branch information
cwpearson authored Aug 24, 2023
1 parent ef39361 commit bfdfdae
Show file tree
Hide file tree
Showing 9 changed files with 109 additions and 122 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ endif()

# GPU profilers
if(Kokkos_ENABLE_CUDA)
add_subdirectory(profiling/nvprof-connector)
add_subdirectory(profiling/nvtx-connector)
add_subdirectory(profiling/nvprof-focused-connector)
endif()
if(Kokkos_ENABLE_HIP)
Expand Down
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Note: `Kokkos` must be configured with `Kokkos_ENABLE_LIBDL=ON` to load profilin

## General Usage

To use one of the tools you have to compile it, which will generate a dynamic library. Before executing the Kokkos application you then have to set the environment variable `KOKKOS_TOOLS_LIBS` to point to the dynamic library e.g. in the `bash` shell:
To use one of the tools you have to compile it, which will generate a dynamic library. Before executing the Kokkos application you then have to set the environment variable `KOKKOS_TOOLS_LIBS` to point to the dynamic library e.g. in the `bash` shell:
```
export KOKKOS_TOOLS_LIBS=${HOME}/kokkos-tools/src/tools/memory-events/kp_memory_event.so
```
Expand Down Expand Up @@ -69,18 +69,21 @@ The following provides an overview of the tools available in the set of Kokkos T

Like VTuneConnector but turns profiling off outside of kernels. Should be used in conjunction with the KernelFilter tool.

+ [**NVTXConnector:**](https://github.com/kokkos/kokkos-tools/wiki/NVTXConnector)

Provides Kokkos Kernel Names to NVTX, so that analysis can be performed on a per kernel base.

+ [**Timemory:**](https://github.com/kokkos/kokkos-tools/wiki/Timemory)

Modular connector for accumulating timing, memory usage, hardware counters, and other various metrics.
Supports controlling VTune, CUDA profilers, and TAU + kernel name forwarding to VTune, NVTX, TAU,
Caliper, and LIKWID.

##### If you need to write your own plug-in, this provides a straight-forward API to writing the plug-in.
##### If you need to write your own plug-in, this provides a straight-forward API to writing the plug-in.

Defining a timemory component will enable your plug-in to output to stdout, text, and JSON,
accumulate statistics, and utilize various portable function calls for common needs w.r.t. timers,
resource usage, etc.


# Building Kokkos Tools

Expand All @@ -91,18 +94,18 @@ Use either CMake or Makefile to build Kokkos Tools.
1. create a build directory in Kokkos Tools, e.g., type `mkdir myBuild; cd myBuild`
2. To configure the Type `ccmake ..` for any options you would like to enable/disable.
3. To compile, type `make`
4. To install, type `make install`
4. To install, type `make install`

## Using make

To build with make, simply type `make` within each subdirectory of Kokkos Tools.


Building with Makefiles is currently recommended.
Building using `make` is currently recommended. Eventually, the preferred method of building will be `cmake`.

# Running a Kokkos-based Application with a tool

Given your tool shared library <name_of_tool_shared_library>.so (which contains kokkos profiling callback functions) and an application executable called yourApplication.exe, type:
Given your tool shared library `<name_of_tool_shared_library>.so` (which contains kokkos profiling callback functions) and an application executable called yourApplication.exe, type:

`export KOKKOS_TOOLS_LIBS=${YOUR_KOKKOS_TOOLS_DIR}/<name_of_tool_shared_lib>; ./yourApplication.exe`

Expand Down
4 changes: 2 additions & 2 deletions profiling/all/kp_all.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ KOKKOSTOOLS_EXTERN_EVENT_SET(VTuneFocusedConnector)
KOKKOSTOOLS_EXTERN_EVENT_SET(VariorumConnector)
#endif
#ifdef KOKKOSTOOLS_HAS_NVPROF
KOKKOSTOOLS_EXTERN_EVENT_SET(NVProfConnector)
KOKKOSTOOLS_EXTERN_EVENT_SET(NVTXConnector)
KOKKOSTOOLS_EXTERN_EVENT_SET(NVProfFocusedConnector)
#endif
#ifdef KOKKOSTOOLS_HAS_CALIPER
Expand Down Expand Up @@ -91,7 +91,7 @@ EventSet get_event_set(const char* profiler, const char* config_str) {
handlers["caliper"] = cali::get_kokkos_event_set(config_str);
#endif
#ifdef KOKKOSTOOLS_HAS_NVPROF
handlers["nvprof-connector"] = NVProfConnector::get_event_set();
handlers["nvtx-connector"] = NVTXConnector::get_event_set();
handlers["nvprof-focused-connector"] =
NVProfFocusedConnector::get_event_set();
#endif
Expand Down
13 changes: 13 additions & 0 deletions profiling/all/kp_core.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ using Kokkos::Tools::SpaceHandle;
#define EXPOSE_STOP_PROFILE_SECTION(FUNC_NAME)
#define EXPOSE_DESTROY_PROFILE_SECTION(FUNC_NAME)
#define EXPOSE_PROFILE_EVENT(FUNC_NAME)
#define EXPOSE_BEGIN_FENCE(FUNC_NAME)
#define EXPOSE_END_FENCE(FUNC_NAME)

#else

Expand Down Expand Up @@ -165,6 +167,17 @@ using Kokkos::Tools::SpaceHandle;
FUNC_NAME(name); \
}

#define EXPOSE_BEGIN_FENCE(FUNC_NAME) \
__attribute__((weak)) void kokkosp_begin_fence( \
const char* name, const uint32_t deviceId, uint64_t* handle) { \
FUNC_NAME(name, deviceId, handle); \
}

#define EXPOSE_END_FENCE(FUNC_NAME) \
__attribute__((weak)) void kokkosp_end_fence(uint64_t handle) { \
FUNC_NAME(handle); \
}

#define EXPOSE_DUAL_VIEW_SYNC(FUNC_NAME) \
__attribute__((weak)) void kokkosp_dual_view_sync( \
const char* name, const void* const ptr, bool is_device) { \
Expand Down
4 changes: 0 additions & 4 deletions profiling/nvprof-connector/CMakeLists.txt

This file was deleted.

78 changes: 0 additions & 78 deletions profiling/nvprof-connector/kp_nvprof_connector_domain.h

This file was deleted.

4 changes: 4 additions & 0 deletions profiling/nvtx-connector/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
find_package(CUDAToolkit REQUIRED)
kp_add_library(kp_nvtx_connector kp_nvtx_connector.cpp)

target_link_libraries(kp_nvtx_connector CUDA::nvToolsExt)
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ LDFLAGS=-L$(CUDA_ROOT)/lib64
LIBS=-lnvToolsExt
SHARED_CXXFLAGS=-shared -fPIC

all: kp_nvprof_connector.so
all: kp_nvtx_connector.so

MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))

CXXFLAGS+=-I${MAKEFILE_PATH} -I${MAKEFILE_PATH}/../../common/makefile-only -I${MAKEFILE_PATH}../all
CXXFLAGS+=-I${MAKEFILE_PATH} -I${MAKEFILE_PATH}../../common/makefile-only -I${MAKEFILE_PATH}../all

kp_nvprof_connector.so: ${MAKEFILE_PATH}kp_nvprof_connector.cpp
kp_nvtx_connector.so: ${MAKEFILE_PATH}kp_nvtx_connector.cpp
$(CXX) $(SHARED_CXXFLAGS) $(CXXFLAGS) $(LDFLAGS) \
-o $@ ${MAKEFILE_PATH}kp_nvprof_connector.cpp $(LIBS)
-o $@ ${MAKEFILE_PATH}kp_nvtx_connector.cpp $(LIBS)

clean:
rm *.so
rm -f kp_nvtx_connector.so
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,25 @@
#include <cstdint>
#include <vector>
#include <string>
#include <limits>

#include "nvToolsExt.h"

#include "kp_core.hpp"

static bool tool_globfences;

namespace KokkosTools {
namespace NVProfConnector {
namespace NVTXConnector {

void kokkosp_request_tool_settings(const uint32_t,
Kokkos_Tools_ToolSettings* settings) {
settings->requires_global_fencing = false;
settings->requires_global_fencing = true;
if (tool_globfences) {
settings->requires_global_fencing = true;
} else {
settings->requires_global_fencing = false;
}
}

void kokkosp_init_library(const int loadSeq, const uint64_t interfaceVer,
Expand All @@ -39,9 +47,14 @@ void kokkosp_init_library(const int loadSeq, const uint64_t interfaceVer,
loadSeq, (unsigned long long)(interfaceVer));
printf("-----------------------------------------------------------\n");

nvtxNameOsThread(pthread_self(), "Application Main Thread");
nvtxMarkA("Kokkos::Initialization Complete");
}
const char* tool_global_fences = getenv("KOKKOS_TOOLS_GLOBALFENCES");
if (NULL != tool_global_fences) {
tool_globfences = (atoi(tool_global_fences) != 0);
nvtxNameOsThread(pthread_self(), "Application Main Thread");
nvtxMarkA("Kokkos::Initialization Complete");
}

} // end kokkosp_init_library

void kokkosp_finalize_library() {
printf("-----------------------------------------------------------\n");
Expand Down Expand Up @@ -87,24 +100,6 @@ struct Section {
std::vector<Section> kokkosp_sections;
} // namespace

Kokkos::Tools::Experimental::EventSet get_event_set() {
Kokkos::Tools::Experimental::EventSet my_event_set;
memset(&my_event_set, 0,
sizeof(my_event_set)); // zero any pointers not set here
my_event_set.request_tool_settings = kokkosp_request_tool_settings;
my_event_set.init = kokkosp_init_library;
my_event_set.finalize = kokkosp_finalize_library;
my_event_set.push_region = kokkosp_push_profile_region;
my_event_set.pop_region = kokkosp_pop_profile_region;
my_event_set.begin_parallel_for = kokkosp_begin_parallel_for;
my_event_set.begin_parallel_reduce = kokkosp_begin_parallel_reduce;
my_event_set.begin_parallel_scan = kokkosp_begin_parallel_scan;
my_event_set.end_parallel_for = kokkosp_end_parallel_for;
my_event_set.end_parallel_reduce = kokkosp_end_parallel_reduce;
my_event_set.end_parallel_scan = kokkosp_end_parallel_scan;
return my_event_set;
}

void kokkosp_create_profile_section(const char* name, uint32_t* sID) {
*sID = kokkosp_sections.size();
kokkosp_sections.push_back(
Expand All @@ -121,12 +116,61 @@ void kokkosp_stop_profile_section(const uint32_t sID) {
nvtxRangeEnd(section.id);
}

} // namespace NVProfConnector
void kokkosp_profile_event(const char* name) { nvtxMarkA(name); }

void kokkosp_begin_fence(const char* name, const uint32_t deviceId,
uint64_t* handle) {
// filter out fence as this is a duplicate and unneeded (causing the tool to
// hinder performance of application). We use strstr for checking if the
// string contains the label of a fence (we assume the user will always have
// the word fence in the label of the fence).
if (std::strstr(name, "Kokkos Profile Tool Fence")) {
// set the dereferenced execution identifier to be the maximum value of
// uint64_t, which is assumed to never be assigned
*handle = std::numeric_limits<uint64_t>::max();
} else {
nvtxRangeId_t id = nvtxRangeStartA(name);
*handle = id; // handle will be provided back to end_fence
}
}

void kokkosp_end_fence(uint64_t handle) {
nvtxRangeId_t id = handle;
if (handle != std::numeric_limits<uint64_t>::max()) {
nvtxRangeEnd(id);
}
}

Kokkos::Tools::Experimental::EventSet get_event_set() {
Kokkos::Tools::Experimental::EventSet my_event_set;
memset(&my_event_set, 0,
sizeof(my_event_set)); // zero any pointers not set here
my_event_set.request_tool_settings = kokkosp_request_tool_settings;
my_event_set.init = kokkosp_init_library;
my_event_set.finalize = kokkosp_finalize_library;
my_event_set.push_region = kokkosp_push_profile_region;
my_event_set.pop_region = kokkosp_pop_profile_region;
my_event_set.begin_parallel_for = kokkosp_begin_parallel_for;
my_event_set.begin_parallel_reduce = kokkosp_begin_parallel_reduce;
my_event_set.begin_parallel_scan = kokkosp_begin_parallel_scan;
my_event_set.end_parallel_for = kokkosp_end_parallel_for;
my_event_set.end_parallel_reduce = kokkosp_end_parallel_reduce;
my_event_set.end_parallel_scan = kokkosp_end_parallel_scan;
my_event_set.create_profile_section = kokkosp_create_profile_section;
my_event_set.start_profile_section = kokkosp_start_profile_section;
my_event_set.stop_profile_section = kokkosp_stop_profile_section;
my_event_set.profile_event = kokkosp_profile_event;
my_event_set.begin_fence = kokkosp_begin_fence;
my_event_set.end_fence = kokkosp_end_fence;
return my_event_set;
}

} // namespace NVTXConnector
} // namespace KokkosTools

extern "C" {

namespace impl = KokkosTools::NVProfConnector;
namespace impl = KokkosTools::NVTXConnector;

EXPOSE_TOOL_SETTINGS(impl::kokkosp_request_tool_settings)
EXPOSE_INIT(impl::kokkosp_init_library)
Expand All @@ -139,5 +183,10 @@ EXPOSE_BEGIN_PARALLEL_SCAN(impl::kokkosp_begin_parallel_scan)
EXPOSE_END_PARALLEL_SCAN(impl::kokkosp_end_parallel_scan)
EXPOSE_BEGIN_PARALLEL_REDUCE(impl::kokkosp_begin_parallel_reduce)
EXPOSE_END_PARALLEL_REDUCE(impl::kokkosp_end_parallel_reduce)
// TODO: expose section stuff
EXPOSE_CREATE_PROFILE_SECTION(impl::kokkosp_create_profile_section)
EXPOSE_START_PROFILE_SECTION(impl::kokkosp_start_profile_section)
EXPOSE_STOP_PROFILE_SECTION(impl::kokkosp_stop_profile_section)
EXPOSE_PROFILE_EVENT(impl::kokkosp_profile_event);
EXPOSE_BEGIN_FENCE(impl::kokkosp_begin_fence);
EXPOSE_END_FENCE(impl::kokkosp_end_fence);
} // extern "C"

0 comments on commit bfdfdae

Please sign in to comment.