Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programatically filter out tool-induced fences from the sampler tool utility #194

Closed
wants to merge 31 commits into from

Conversation

vlkale
Copy link
Contributor

@vlkale vlkale commented May 5, 2023

Fix #179.

@vlkale vlkale self-assigned this May 5, 2023
Vivek Kale and others added 13 commits May 5, 2023 14:50
Adding  tool-invoked_fence function. Note that fence function is is doing global fencing, and that the end_parallel_* will need a devID if doing per-device fencing in the future.
…device ID for fence (which could be obtained through the space handle).
…device ID for fence (which could be obtained through the space handle). Fixing with formatting.
…terface function. Still need to get device ID for fence (which could be obtained through the space handle). Fixing with formatting.
@vlkale vlkale changed the title Progmattically filter out tool-induced fences from the sampler tool callback Programatically filter out tool-induced fences from the sampler tool callback Jun 8, 2023
@vlkale vlkale changed the title Programatically filter out tool-induced fences from the sampler tool callback Programatically filter out tool-induced fences from the sampler tool utility Jun 8, 2023
@vlkale
Copy link
Contributor Author

vlkale commented Jul 20, 2023

This is a example run and output on MacBook using g++ (default) and Kokkos serial host backend for stream, without and with Kokkos Tools global fencing.

Kokkos Tools Global Fence turned off (no auto-fencing)

s1017105ca:stream vlkale$  export KOKKOS_TOOLS_GLOBALFENCES=0;  ./stream-1000iters.exe 
-------------------------------------------------------------
Kokkos STREAM Benchmark
-------------------------------------------------------------
KokkosP: Next library to call: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-tools/vbld2/profiling/simple-kernel-timer/libkp_kernel_timer.dylib
KokkosP: Loading child library ..
KokkosP: Simple Kernel Timer Library Initialized (sequence is 1, version: 20211015)
KokkosP: Function Status:
KokkosP: begin-parallel-for:      yes
KokkosP: begin-parallel-scan:     yes
KokkosP: begin-parallel-reduce:   yes
KokkosP: end-parallel-for:        yes
KokkosP: end-parallel-scan:       yes
KokkosP: end-parallel-reduce:     yes
KokkosP: Sampling rate set to: 105
Reports fastest timing per kernel
Creating Views...
Memory Sizes:
- Array Size:    100000000
- Per Array:           800.00 MB
- Total:              2400.00 MB
Benchmark kernels will be performed for 1000 iterations.
-------------------------------------------------------------
Initializing Views...
Starting benchmarking...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 1 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 2 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 3 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 4 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 5 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 6 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 7 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 8 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 9 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 10 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 11 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 12 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 13 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 14 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 15 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 16 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 17 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 18 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 19 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 20 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 21 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 22 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 23 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 24 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 25 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 26 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 27 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 28 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 29 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 30 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 31 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 32 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 33 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 34 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 35 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 36 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 37 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 38 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 39 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 40 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 41 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 42 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 43 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 44 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 45 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 46 calling child-end function...
Performing validation...
All solutions checked and verified.
-------------------------------------------------------------
Set                12559.15 MB/s
Copy               15953.07 MB/s
Scale              15780.36 MB/s
Add                17448.99 MB/s
Triad              17189.12 MB/s
-------------------------------------------------------------
KokkosP: Kernel timing written to /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos/benchmarks/stream/s1017105ca-28032.dat 

Kokkos Tools Global Fence (auto-fencing) on

s1017105ca:stream vlkale$ export KOKKOS_TOOLS_GLOBALFENCES=1; ./stream-1000iters.exe 
-------------------------------------------------------------
Kokkos STREAM Benchmark
-------------------------------------------------------------
KokkosP: Next library to call: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-tools/vbld2/profiling/simple-kernel-timer/libkp_kernel_timer.dylib
KokkosP: Loading child library ..
KokkosP: Simple Kernel Timer Library Initialized (sequence is 1, version: 20211015)
KokkosP: Function Status:
KokkosP: begin-parallel-for:      yes
KokkosP: begin-parallel-scan:     yes
KokkosP: begin-parallel-reduce:   yes
KokkosP: end-parallel-for:        yes
KokkosP: end-parallel-scan:       yes
KokkosP: end-parallel-reduce:     yes
KokkosP: Sampling rate set to: 105
Reports fastest timing per kernel
Creating Views...
Memory Sizes:
- Array Size:    100000000
- Per Array:           800.00 MB
- Total:              2400.00 MB
Benchmark kernels will be performed for 1000 iterations.
-------------------------------------------------------------
Initializing Views...
Starting benchmarking...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 1 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 2 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 3 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 4 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 5 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 6 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 7 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 8 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 9 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 10 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 11 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 12 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 13 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 14 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 15 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 16 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 17 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 18 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 19 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 20 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 21 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 22 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 23 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 24 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 25 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 26 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 27 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 28 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 29 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 30 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 31 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 32 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 33 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 34 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 35 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 36 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 37 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 38 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 39 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 40 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 41 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 42 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 43 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 44 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 45 calling child-end function...
KokkosP: sample 1 calling child-begin function...
KokkosP: sample 46 calling child-end function...
Performing validation...
All solutions checked and verified.
-------------------------------------------------------------
Set                12224.61 MB/s
Copy               15412.82 MB/s
Scale              15549.36 MB/s
Add                16772.14 MB/s
Triad              16554.17 MB/s
-------------------------------------------------------------
KokkosP: Kernel timing written to /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos/benchmarks/stream/s1017105ca-28446.dat 

@vlkale vlkale marked this pull request as ready for review July 20, 2023 19:40
@vlkale vlkale marked this pull request as draft July 20, 2023 20:32
Comment on lines 10 to 11
// using Kokkos::Tools::Experimental;
// using mytpi_type = Kokkos::Tools::Experimental::ToolProgrammingInterface;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all commented lines you are not using.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

common/kokkos-sampler/kp_sampler_skip.cpp Outdated Show resolved Hide resolved
@@ -82,9 +125,7 @@ void kokkosp_init_library(const int loadSeq, const uint64_t interfaceVer,
printf("KokkosP: Next library to call: %s\n", nextLibrary);
printf("KokkosP: Loading child library ..\n");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with all these whitespace changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix, making sure the code file is indeed getting processed through clang-format.

Copy link
Contributor Author

@vlkale vlkale Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masterleinad Does the new file committed address the issues you have with whitespace changes?

@masterleinad
Copy link
Contributor

Sample result on MacBook using g++ (default) and Serial host backend for stream, with and without the fence.

Can you please summarize and interpret the results?

@vlkale
Copy link
Contributor Author

vlkale commented Jul 20, 2023

Sample result on MacBook using g++ (default) and Serial host backend for stream, with and without the fence.

Can you please summarize and interpret the results?

@masterleinad The result shows output of the test of the proposed new sampler with optimization to fencing, in the case that global fencing is on (KOKKOS_TOOLS_GLOBALFENCES=1) and the case that global fences is off (KOKKOS_TOOLS_GLOBALFENCES=0).

The output shows that given the sampling skip rate is 105, only 46 kernel invocations out of the 1000 have a Kokkos Tool child event for the Kokkos Tools connector specified (here simple-kernel-timer) dispatched. One can intrepret this as the sampler behaving correctly, in both cases.

Perhaps a better test case here is the output from the version of the sampler in Kokkos Tools develop branch and the version of sampler from this PR (again testing for both cases of globalFences turned on and global fences turned off), showing a timing comparison between the version in develop and this PR. I will put this in. Also, the priority configuration of the two is the one with KOKKOS_TOOLS_GLOBALFENCES=1, i.e., the default.

Removing all commented code (which is no longer needed) as requested.
@vlkale
Copy link
Contributor Author

vlkale commented Jul 21, 2023

For reference, here is an output of a build and compilation on a MacBook Pro (2.9 GHz Quad-Core Intel Core i7)

s1017105ca:vbuild-fenceLimitedSingleLib vlkale$ cmake ..
-- 
-- ConfiguringKokkos-Tools
-- 
-- Found Kokkos installation: /usr/local
		Devices: SERIAL
		Architecture: 
		TPLs: LIBDL
		Compiler: /Library/Developer/CommandLineTools/usr/bin/c++ (AppleClang)
		CMAKE_CXX_FLAGS: 
		Options: DEPRECATED_CODE_4;DEPRECATION_WARNINGS;LAUNCH_COMPILER;IMPL_DESUL_ATOMICS;COMPLEX_ALIGN
-- PAPI support disabled
-- MPI not available. MPI disabled.
CMake Warning at cmake/configure_variorum.cmake:23 (message):
  Variorum not found: set Variorum_ROOT CMake variable or VARIORUM_ROOT
  environment variable to build Variorum connector
Call Stack (most recent call first):
  CMakeLists.txt:95 (include)


CMake Warning at CMakeLists.txt:107 (message):
  Set VTUNE_HOME in environment or VTune_ROOT in build options to build VTune
  connectors


-- Apple OSX target detected.
-- Skipping memory-hwm-mpi (MPI disabled)
-- Building Monolithic KokkosTools library with profilers: kp_kernel_timer_json;kp_kernel_timer;kp_hwm;kp_memory_events;kp_memory_usage;kp_chrome_tracing;kp_space_time_stack;kp_perfetto_connector
-- Configuring done (0.1s)
-- Generating done (0.2s)
-- Build files have been written to: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-tools-develop/vbuild-fenceLimitedSingleLib
s1017105ca:vbuild-fenceLimitedSingleLib vlkale$ make -j
[  6%] Built target kp_kokkos_sampler
[ 12%] Built target kp_kernel_shared
[ 19%] Built target kp_kernel_filter
[ 25%] Built target kp_space_time_stack
[ 32%] Built target kp_kernel_logger
[ 38%] Built target kp_memory_usage
[ 45%] Built target kp_memory_events
[ 51%] Built target kp_hwm
[ 58%] Built target kp_chrome_tracing
[ 67%] Built target kp_perfetto_connector
[ 74%] Built target kp_kernel_timer_json
[ 80%] Built target kp_kernel_timer
[ 87%] Built target kokkostools
[ 93%] Built target kp_reader
[100%] Built target kp_json_writer
s1017105ca:vbuild-fenceLimitedSingleLib vlkale$ make clean;
s1017105ca:vbuild-fenceLimitedSingleLib vlkale$ make -j
[  3%] Building CXX object common/kokkos-sampler/CMakeFiles/kp_kokkos_sampler.dir/kp_sampler_skip.cpp.o
[  6%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_kernel_shared.dir/kp_shared.cpp.o
[  9%] Building CXX object debugging/kernel-logger/CMakeFiles/kp_kernel_logger.dir/kp_kernel_logger.cpp.o
[ 12%] Building CXX object profiling/memory-hwm/CMakeFiles/kp_hwm.dir/kp_hwm.cpp.o
[ 16%] Building CXX object common/kernel-filter/CMakeFiles/kp_kernel_filter.dir/kp_kernel_filter.cpp.o
[ 19%] Building CXX object profiling/memory-usage/CMakeFiles/kp_memory_usage.dir/kp_memory_usage.cpp.o
[ 22%] Building CXX object profiling/memory-events/CMakeFiles/kp_memory_events.dir/kp_memory_events.cpp.o
[ 25%] Building CXX object profiling/space-time-stack/CMakeFiles/kp_space_time_stack.dir/kp_space_time_stack.cpp.o
[ 29%] Building CXX object profiling/perfetto-connector/CMakeFiles/kp_perfetto_connector.dir/libperfetto-connector.cpp.o
[ 32%] Building CXX object profiling/chrome-tracing/CMakeFiles/kp_chrome_tracing.dir/kp_chrome_tracing.cpp.o
[ 35%] Building CXX object profiling/perfetto-connector/CMakeFiles/kp_perfetto_connector.dir/perfetto/perfetto.cc.o
[ 38%] Linking CXX shared library libkp_kokkos_sampler.dylib
[ 38%] Built target kp_kokkos_sampler
[ 41%] Linking CXX shared library libkp_hwm.dylib
[ 45%] Linking CXX shared library libkp_kernel_logger.dylib
[ 45%] Built target kp_hwm
[ 48%] Linking CXX static library libkp_kernel_shared.a
[ 48%] Built target kp_kernel_logger
[ 51%] Linking CXX shared library libkp_memory_usage.dylib
[ 51%] Built target kp_kernel_shared
[ 54%] Linking CXX shared library libkp_memory_events.dylib
[ 54%] Built target kp_memory_usage
[ 58%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_kernel_timer_json.dir/kp_kernel_timer_json.cpp.o
[ 61%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_kernel_timer.dir/kp_kernel_timer.cpp.o
[ 61%] Built target kp_memory_events
[ 64%] Linking CXX shared library libkp_chrome_tracing.dylib
[ 64%] Built target kp_chrome_tracing
[ 67%] Linking CXX shared library libkp_kernel_filter.dylib
[ 70%] Linking CXX shared library libkp_kernel_timer_json.dylib
[ 70%] Built target kp_kernel_filter
[ 74%] Linking CXX shared library libkp_kernel_timer.dylib
[ 74%] Built target kp_kernel_timer_json
[ 74%] Built target kp_kernel_timer
[ 77%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_reader.dir/kp_reader.cpp.o
[ 80%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_json_writer.dir/kp_json_writer.cpp.o
[ 83%] Linking CXX shared library libkp_space_time_stack.dylib
[ 83%] Built target kp_space_time_stack
[ 87%] Linking CXX executable kp_reader
[ 87%] Built target kp_reader
[ 90%] Linking CXX executable kp_json_writer
[ 90%] Built target kp_json_writer
[ 93%] Linking CXX shared library libkp_perfetto_connector.dylib
[ 93%] Built target kp_perfetto_connector
[ 96%] Building CXX object profiling/all/CMakeFiles/kokkostools.dir/kp_all.cpp.o
[100%] Linking CXX shared library libkokkostools.dylib
[100%] Built target kokkostools
s1017105ca:vbuild-fenceLimitedSingleLib vlkale$ make install
[  6%] Built target kp_kernel_filter
[ 12%] Built target kp_kokkos_sampler
[ 19%] Built target kp_kernel_logger
[ 25%] Built target kp_kernel_shared
[ 32%] Built target kp_kernel_timer_json
[ 38%] Built target kp_kernel_timer
[ 45%] Built target kp_reader
[ 51%] Built target kp_json_writer
[ 58%] Built target kp_hwm
[ 64%] Built target kp_memory_events
[ 70%] Built target kp_memory_usage
[ 77%] Built target kp_chrome_tracing
[ 83%] Built target kp_space_time_stack
[ 93%] Built target kp_perfetto_connector
[100%] Built target kokkostools
Install the project...
-- Install configuration: ""
-- Up-to-date: /usr/local/include/kp_all.hpp
-- Up-to-date: /usr/local/include/kp_config.hpp
-- Installing: /usr/local/lib/libkokkostools.dylib
-- Installing: /usr/local/lib/libkp_kernel_shared.a
-- Installing: /usr/local/lib/libkp_kernel_timer_json.dylib
-- Installing: /usr/local/lib/libkp_kernel_timer.dylib
-- Installing: /usr/local/lib/libkp_hwm.dylib
-- Installing: /usr/local/lib/libkp_memory_events.dylib
-- Installing: /usr/local/lib/libkp_memory_usage.dylib
-- Installing: /usr/local/lib/libkp_chrome_tracing.dylib
-- Installing: /usr/local/lib/libkp_space_time_stack.dylib
-- Installing: /usr/local/lib/libkp_perfetto_connector.dylib
-- Installing: /usr/local/lib/cmake/KokkosToolsConfig.cmake
-- Installing: /usr/local/lib/cmake/KokkosToolsConfig-noconfig.cmake

@vlkale vlkale marked this pull request as ready for review July 21, 2023 02:05
@vlkale
Copy link
Contributor Author

vlkale commented Jul 24, 2023

I am aware of the build test with Kokkos that's in the CI that is failing.
Looking further into this, I found that upon testing on another machine, the Device ID is 100663297 . It should be 0.

I don't know if that is the reason for the problem.

I am putting this back into a draft PR to figure things out from my end.

@vlkale vlkale marked this pull request as draft July 24, 2023 18:28
@masterleinad
Copy link
Contributor

Looking further into this, I found that upon testing on another machine, the Device ID is 100663297 . It should be 0.

Note that the device identifier used in the Kokkos::Tools interface is computed as https://github.com/kokkos/kokkos/blob/4d1c6c351490cc8660e2392ead9fce1af7e379f5/core/src/impl/Kokkos_Profiling_Interface.hpp#L97-L103 and you might want ti use https://github.com/kokkos/kokkos/blob/4d1c6c351490cc8660e2392ead9fce1af7e379f5/core/src/impl/Kokkos_Profiling_Interface.hpp#L79-L86 to translate that to a device type, device id and instance id.

@vlkale
Copy link
Contributor Author

vlkale commented Jul 24, 2023

@masterleinad Thanks for that. I will take a look at that part of the interface.

@vlkale vlkale marked this pull request as ready for review August 1, 2023 21:29
More  with hashmap and nestedID coming
vlkale added a commit to vlkale/kokkos-tools that referenced this pull request Aug 5, 2023
mistakenly made this change in this PR but it is part of PR kokkos#194
@vlkale vlkale closed this Sep 14, 2023
@vlkale vlkale mentioned this pull request Sep 14, 2023
@vlkale vlkale deleted the FenceOnlyOnSamplePick branch October 27, 2023 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invoke Kokkos::Fence programmatically from within a Kokkos tool (focused on sampler)
2 participants