Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename nvprof-connector to nvtx-connector, add 3 more hooks #139

Merged
merged 21 commits into from
Aug 24, 2023
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ endif()

# GPU profilers
if(Kokkos_ENABLE_CUDA)
add_subdirectory(profiling/nvprof-connector)
add_subdirectory(profiling/nvtx-connector)
add_subdirectory(profiling/nvprof-focused-connector)
endif()
if(Kokkos_ENABLE_HIP)
Expand Down
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Note: `Kokkos` must be configured with `Kokkos_ENABLE_LIBDL=ON` to load profilin

## General Usage

To use one of the tools you have to compile it, which will generate a dynamic library. Before executing the Kokkos application you then have to set the environment variable `KOKKOS_TOOLS_LIBS` to point to the dynamic library e.g. in the `bash` shell:
To use one of the tools you have to compile it, which will generate a dynamic library. Before executing the Kokkos application you then have to set the environment variable `KOKKOS_TOOLS_LIBS` to point to the dynamic library e.g. in the `bash` shell:
```
export KOKKOS_TOOLS_LIBS=${HOME}/kokkos-tools/src/tools/memory-events/kp_memory_event.so
```
Expand Down Expand Up @@ -69,18 +69,21 @@ The following provides an overview of the tools available in the set of Kokkos T

Like VTuneConnector but turns profiling off outside of kernels. Should be used in conjunction with the KernelFilter tool.

+ [**NVTXConnector:**](https://github.com/kokkos/kokkos-tools/wiki/NVTXConnector)

Provides Kokkos Kernel Names to NVTX, so that analysis can be performed on a per kernel base.

+ [**Timemory:**](https://github.com/kokkos/kokkos-tools/wiki/Timemory)

Modular connector for accumulating timing, memory usage, hardware counters, and other various metrics.
Supports controlling VTune, CUDA profilers, and TAU + kernel name forwarding to VTune, NVTX, TAU,
Caliper, and LIKWID.

##### If you need to write your own plug-in, this provides a straight-forward API to writing the plug-in.
##### If you need to write your own plug-in, this provides a straight-forward API to writing the plug-in.

Defining a timemory component will enable your plug-in to output to stdout, text, and JSON,
accumulate statistics, and utilize various portable function calls for common needs w.r.t. timers,
resource usage, etc.


# Building Kokkos Tools

Expand All @@ -91,18 +94,18 @@ Use either CMake or Makefile to build Kokkos Tools.
1. create a build directory in Kokkos Tools, e.g., type `mkdir myBuild; cd myBuild`
2. To configure the Type `ccmake ..` for any options you would like to enable/disable.
3. To compile, type `make`
4. To install, type `make install`
4. To install, type `make install`

## Using make

To build with make, simply type `make` within each subdirectory of Kokkos Tools.


Building with Makefiles is currently recommended.
Building using `make` is currently recommended. Eventually, the preferred method of building will be `cmake`.

# Running a Kokkos-based Application with a tool

Given your tool shared library <name_of_tool_shared_library>.so (which contains kokkos profiling callback functions) and an application executable called yourApplication.exe, type:
Given your tool shared library `<name_of_tool_shared_library>.so` (which contains kokkos profiling callback functions) and an application executable called yourApplication.exe, type:

`export KOKKOS_TOOLS_LIBS=${YOUR_KOKKOS_TOOLS_DIR}/<name_of_tool_shared_lib>; ./yourApplication.exe`

Expand Down
4 changes: 2 additions & 2 deletions profiling/all/kp_all.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ KOKKOSTOOLS_EXTERN_EVENT_SET(VTuneFocusedConnector)
KOKKOSTOOLS_EXTERN_EVENT_SET(VariorumConnector)
#endif
#ifdef KOKKOSTOOLS_HAS_NVPROF
KOKKOSTOOLS_EXTERN_EVENT_SET(NVProfConnector)
KOKKOSTOOLS_EXTERN_EVENT_SET(NVTXConnector)
KOKKOSTOOLS_EXTERN_EVENT_SET(NVProfFocusedConnector)
#endif
#ifdef KOKKOSTOOLS_HAS_CALIPER
Expand Down Expand Up @@ -91,7 +91,7 @@ EventSet get_event_set(const char* profiler, const char* config_str) {
handlers["caliper"] = cali::get_kokkos_event_set(config_str);
#endif
#ifdef KOKKOSTOOLS_HAS_NVPROF
handlers["nvprof-connector"] = NVProfConnector::get_event_set();
handlers["nvtx-connector"] = NVTXConnector::get_event_set();
handlers["nvprof-focused-connector"] =
NVProfFocusedConnector::get_event_set();
#endif
Expand Down
13 changes: 13 additions & 0 deletions profiling/all/kp_core.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ using Kokkos::Tools::SpaceHandle;
#define EXPOSE_STOP_PROFILE_SECTION(FUNC_NAME)
#define EXPOSE_DESTROY_PROFILE_SECTION(FUNC_NAME)
#define EXPOSE_PROFILE_EVENT(FUNC_NAME)
#define EXPOSE_BEGIN_FENCE(FUNC_NAME)
#define EXPOSE_END_FENCE(FUNC_NAME)

#else

Expand Down Expand Up @@ -165,6 +167,17 @@ using Kokkos::Tools::SpaceHandle;
FUNC_NAME(name); \
}

#define EXPOSE_BEGIN_FENCE(FUNC_NAME) \
__attribute__((weak)) void kokkosp_begin_fence( \
const char* name, const uint32_t deviceId, uint64_t* handle) { \
FUNC_NAME(name, deviceId, handle); \
}

#define EXPOSE_END_FENCE(FUNC_NAME) \
__attribute__((weak)) void kokkosp_end_fence(uint64_t handle) { \
FUNC_NAME(handle); \
}

#define EXPOSE_DUAL_VIEW_SYNC(FUNC_NAME) \
__attribute__((weak)) void kokkosp_dual_view_sync( \
const char* name, const void* const ptr, bool is_device) { \
Expand Down
4 changes: 0 additions & 4 deletions profiling/nvprof-connector/CMakeLists.txt

This file was deleted.

78 changes: 0 additions & 78 deletions profiling/nvprof-connector/kp_nvprof_connector_domain.h

This file was deleted.

4 changes: 4 additions & 0 deletions profiling/nvtx-connector/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
find_package(CUDAToolkit REQUIRED)
kp_add_library(kp_nvtx_connector kp_nvtx_connector.cpp)

target_link_libraries(kp_nvtx_connector CUDA::nvToolsExt)
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ LDFLAGS=-L$(CUDA_ROOT)/lib64
LIBS=-lnvToolsExt
SHARED_CXXFLAGS=-shared -fPIC

all: kp_nvprof_connector.so
all: kp_nvtx_connector.so

MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))

CXXFLAGS+=-I${MAKEFILE_PATH} -I${MAKEFILE_PATH}/../../common/makefile-only -I${MAKEFILE_PATH}../all

kp_nvprof_connector.so: ${MAKEFILE_PATH}kp_nvprof_connector.cpp
kp_nvtx_connector.so: ${MAKEFILE_PATH}kp_nvtx_connector.cpp
$(CXX) $(SHARED_CXXFLAGS) $(CXXFLAGS) $(LDFLAGS) \
-o $@ ${MAKEFILE_PATH}kp_nvprof_connector.cpp $(LIBS)
-o $@ ${MAKEFILE_PATH}kp_nvtx_connector.cpp $(LIBS)

clean:
rm *.so
rm -f kp_nvtx_connector.so
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,25 @@
#include <cstdint>
#include <vector>
#include <string>

#include <limits>
#include "nvToolsExt.h"

#include "kp_core.hpp"

static int tool_globfences; // use an integer for other options
using namespace std;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? That's generally a bad idea.

namespace KokkosTools {
namespace NVProfConnector {
namespace NVTXConnector {

void kokkosp_request_tool_settings(const uint32_t,
Kokkos_Tools_ToolSettings* settings) {
settings->requires_global_fencing = false;
settings->requires_global_fencing = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this kernel renaming doesn't work, and the regions all only measure kernel launch

if (tool_globfences == 1) {
settings->requires_global_fencing = true;
} else {
settings->requires_global_fencing = false;
}
// leave the door open for other nonzero values of tools
}

void kokkosp_init_library(const int loadSeq, const uint64_t interfaceVer,
Expand All @@ -39,6 +47,14 @@ void kokkosp_init_library(const int loadSeq, const uint64_t interfaceVer,
loadSeq, (unsigned long long)(interfaceVer));
printf("-----------------------------------------------------------\n");

const char* tool_global_fences = getenv("KOKKOS_TOOLS_GLOBALFENCES");
if (NULL != tool_global_fences) {
tool_globfences = atoi(tool_global_fences);
} else {
tool_globfences =
1; // default to 1 to be conservative for capturing state by the tool
}

nvtxNameOsThread(pthread_self(), "Application Main Thread");
nvtxMarkA("Kokkos::Initialization Complete");
}
Expand Down Expand Up @@ -87,24 +103,6 @@ struct Section {
std::vector<Section> kokkosp_sections;
} // namespace

Kokkos::Tools::Experimental::EventSet get_event_set() {
Kokkos::Tools::Experimental::EventSet my_event_set;
memset(&my_event_set, 0,
sizeof(my_event_set)); // zero any pointers not set here
my_event_set.request_tool_settings = kokkosp_request_tool_settings;
my_event_set.init = kokkosp_init_library;
my_event_set.finalize = kokkosp_finalize_library;
my_event_set.push_region = kokkosp_push_profile_region;
my_event_set.pop_region = kokkosp_pop_profile_region;
my_event_set.begin_parallel_for = kokkosp_begin_parallel_for;
my_event_set.begin_parallel_reduce = kokkosp_begin_parallel_reduce;
my_event_set.begin_parallel_scan = kokkosp_begin_parallel_scan;
my_event_set.end_parallel_for = kokkosp_end_parallel_for;
my_event_set.end_parallel_reduce = kokkosp_end_parallel_reduce;
my_event_set.end_parallel_scan = kokkosp_end_parallel_scan;
return my_event_set;
}

void kokkosp_create_profile_section(const char* name, uint32_t* sID) {
*sID = kokkosp_sections.size();
kokkosp_sections.push_back(
Expand All @@ -121,12 +119,61 @@ void kokkosp_stop_profile_section(const uint32_t sID) {
nvtxRangeEnd(section.id);
}

} // namespace NVProfConnector
void kokkosp_profile_event(const char* name) { nvtxMarkA(name); }

void kokkosp_begin_fence(const char* name, const uint32_t deviceId,
uint64_t* handle) {
// filter out fence as this is a duplicate and unneeded (causing the tool to
// hinder performance of application). We use strstr for checking if the
// string contains the label of a fence (we assume the user will always have
// the word fence in the label of the fence).
if (std::strstr(name, "Kokkos Profile Tool Fence")) {
// set the dereferenced execution identifier to be the maximum value of
// uint64_t, which is assumed to never be assigned
*handle = numeric_limits<uint64_t>::max();
} else {
nvtxRangeId_t id = nvtxRangeStartA(name);
*handle = id; // handle will be provided back to end_fence
}
}

void kokkosp_end_fence(uint64_t handle) {
nvtxRangeId_t id = handle;
if (handle != numeric_limits<uint64_t>::max()) {
nvtxRangeEnd(id);
}
}

Kokkos::Tools::Experimental::EventSet get_event_set() {
Kokkos::Tools::Experimental::EventSet my_event_set;
memset(&my_event_set, 0,
sizeof(my_event_set)); // zero any pointers not set here
my_event_set.request_tool_settings = kokkosp_request_tool_settings;
my_event_set.init = kokkosp_init_library;
my_event_set.finalize = kokkosp_finalize_library;
my_event_set.push_region = kokkosp_push_profile_region;
my_event_set.pop_region = kokkosp_pop_profile_region;
my_event_set.begin_parallel_for = kokkosp_begin_parallel_for;
my_event_set.begin_parallel_reduce = kokkosp_begin_parallel_reduce;
my_event_set.begin_parallel_scan = kokkosp_begin_parallel_scan;
my_event_set.end_parallel_for = kokkosp_end_parallel_for;
my_event_set.end_parallel_reduce = kokkosp_end_parallel_reduce;
my_event_set.end_parallel_scan = kokkosp_end_parallel_scan;
my_event_set.create_profile_section = kokkosp_create_profile_section;
my_event_set.start_profile_section = kokkosp_start_profile_section;
my_event_set.stop_profile_section = kokkosp_stop_profile_section;
my_event_set.profile_event = kokkosp_profile_event;
my_event_set.begin_fence = kokkosp_begin_fence;
my_event_set.end_fence = kokkosp_end_fence;
return my_event_set;
}

} // namespace NVTXConnector
} // namespace KokkosTools

extern "C" {

namespace impl = KokkosTools::NVProfConnector;
namespace impl = KokkosTools::NVTXConnector;

EXPOSE_TOOL_SETTINGS(impl::kokkosp_request_tool_settings)
EXPOSE_INIT(impl::kokkosp_init_library)
Expand All @@ -139,5 +186,10 @@ EXPOSE_BEGIN_PARALLEL_SCAN(impl::kokkosp_begin_parallel_scan)
EXPOSE_END_PARALLEL_SCAN(impl::kokkosp_end_parallel_scan)
EXPOSE_BEGIN_PARALLEL_REDUCE(impl::kokkosp_begin_parallel_reduce)
EXPOSE_END_PARALLEL_REDUCE(impl::kokkosp_end_parallel_reduce)
// TODO: expose section stuff
EXPOSE_CREATE_PROFILE_SECTION(impl::kokkosp_create_profile_section)
EXPOSE_START_PROFILE_SECTION(impl::kokkosp_start_profile_section)
EXPOSE_STOP_PROFILE_SECTION(impl::kokkosp_stop_profile_section)
EXPOSE_PROFILE_EVENT(impl::kokkosp_profile_event);
EXPOSE_BEGIN_FENCE(impl::kokkosp_begin_fence);
EXPOSE_END_FENCE(impl::kokkosp_end_fence);
} // extern "C"