Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool-induced Fence function pointer is null #233

Closed
vlkale opened this issue Jan 18, 2024 · 6 comments
Closed

Tool-induced Fence function pointer is null #233

vlkale opened this issue Jan 18, 2024 · 6 comments
Assignees
Labels
bug feature Needed feature but software still is correct on its own

Comments

@vlkale
Copy link
Contributor

vlkale commented Jan 18, 2024

The tool-induced fence function when accessed from the ToolsProgrammingInterface and from within a Kokkos Tools connector returns null.

This is a problem currently only for the randomized sampler (a new feature to be added, shown in PR #213), so it is not a critical bug per se. However, not fixing this limits new functionality of Kokkos Tools where Kokkos Tools callbacks can invoke Kokkos core functionality within it. This does not impact any of Kokkos core by itself and it does not impact any other existing tools connectors. If someone wants to use the tool_invoked_fence() in their Kokkos Tools connector, please be aware that it may not work properly at present.

The problem comes when I run the stream benchmark with the serial backend on Perlmutter or a 2017 MacOS MacBook Pro laptop, but not a 2022 MacOS MacBook Pro. It also doesn't seem arise for the OpenMP+CUDA backend on Perlmutter, though this may be serendipitous and I am looking into this.

Here is the output from the reproducer.

vlap@maclaptop: stream % ./stream.exe
-------------------------------------------------------------
Kokkos STREAM Benchmark
-------------------------------------------------------------
KokkosP: Next library to call: /Users/vivekPersonal/Desktop/vlap/wk/softwareTech/kokkos-tools/vbld/debugging/kernel-logger/libkp_kernel_logger.dylib
KokkosP: Loading child library ..
KokkosP: Kernel Logger Library Initialized (sequence is 1, version: 20211015)
KokkosP: Function Status:
KokkosP: begin-parallel-for:      yes
KokkosP: begin-parallel-scan:     yes
KokkosP: begin-parallel-reduce:   yes
KokkosP: end-parallel-for:        yes
KokkosP: end-parallel-scan:       no
KokkosP: end-parallel-reduce:     yes
KokkosP: Sampling rate set to: 1
KokkosP: Sampling rate provided as input: 1
KokkosP: Sampling probability provided as input: 50.0
KokkosP: Sampling rate set to: 2
KokkosP: Sampling probability set to 50.000000
KokkosP: Seeding random number generator using seed 4 for probabilistic sampling.
KokkosP: Note that both probability and skip rate are set. The Kokkos Tools Sampler utility will invoke a Kokkos Tool child event you specified (e.g., the profiler or debugger tool connector you specified in KOKKOS_TOOLS_LIBS) with only specified sampling probability applied and sampling skip rate set is ignored with no predefined periodicity for sampling used.
KokkosP: The skip rate in the sampler utility is being set to 1.
Reports fastest timing per kernel
Creating Views...
Memory Sizes:
- Array Size:    100000000
- Per Array:           800.00 MB
- Total:              2400.00 MB
Benchmark kernels will be performed for 20 iterations.
-------------------------------------------------------------

KokkosP: sample 1 calling child-begin function...
KokkosP: Sampler utility finding  tool-induced fence function and invoking it.
KokkosP: Sampler utility found fence function. Attempting to invoke  tool-induced fence on device 0.
zsh: segmentation fault  ./stream.exe
vlap@macLaptop stream % ./stream.exe
-------------------------------------------------------------
Kokkos STREAM Benchmark
-------------------------------------------------------------
KokkosP: Next library to call: /Users/vivekPersonal/Desktop/vlap/wk/softwareTech/kokkos-tools/vbld/debugging/kernel-logger/libkp_kernel_logger.dylib
KokkosP: Loading child library ..
KokkosP: Kernel Logger Library Initialized (sequence is 1, version: 20211015)
KokkosP: Function Status:
KokkosP: begin-parallel-for:      yes
KokkosP: begin-parallel-scan:     yes
KokkosP: begin-parallel-reduce:   yes
KokkosP: end-parallel-for:        yes
KokkosP: end-parallel-scan:       no
KokkosP: end-parallel-reduce:     yes
KokkosP: Sampling rate set to: 1
KokkosP: Sampling rate provided as input: 1
KokkosP: Sampling probability provided as input: 50.0
KokkosP: Sampling rate set to: 2
KokkosP: Sampling probability set to 50.000000
KokkosP: Seeding random number generator using seed 4 for probabilistic sampling.
KokkosP: Note that both probability and skip rate are set. The Kokkos Tools Sampler utility will invoke a Kokkos Tool child event you specified (e.g., the profiler or debugger tool connector you specified in KOKKOS_TOOLS_LIBS) with only specified sampling probability applied and sampling skip rate set is ignored with no predefined periodicity for sampling used.
KokkosP: The skip rate in the sampler utility is being set to 1.
Reports fastest timing per kernel
Creating Views...
Memory Sizes:
- Array Size:    100000000
- Per Array:           800.00 MB
- Total:              2400.00 MB
Benchmark kernels will be performed for 20 iterations.
-------------------------------------------------------------
KokkosP: sample 1 calling child-begin function...
KokkosP: Sampler utility finding tool-induced fence function and invoking it.
KokkosP: Sampler utility found fence function. Attempting to invoke tool-induced fence on device 0.
zsh: segmentation fault  ./stream.exe

In the case when the sampling rate skip rate is set and no randomized sampling is done (thus not using a tool-invoked fence), the sampler works correctly.

screenshot_2023-10-26_at_2 10 50_pm_720
screenshot_2023-10-26_at_2 11 27_pm_720

Currently, I think the problem comes from Kokkos_Profiling.hpp, where one can there is no tool-induced fence function, as shown in the screenshot of the code file below.

img_1868-1

I am looking into this and will update the Issue as I find more.

@dalg24
Copy link
Member

dalg24 commented Jan 18, 2024

Please improve the description and use some judgments on what is actually needed for a minimal reproducer.
Make sure you provide the environment in which you ran the stream benchmark.
If that is something that only fails with the version currently proposed in #213, this is is not a detail that you mention at the end, it probably comes early in your report.

@vlkale
Copy link
Contributor Author

vlkale commented Jan 18, 2024

Got it, thanks - I have improved the description and will be putting in action items as well.

For a minimal reproducer, yes: I will look into an even simpler Kokkos program as well. I think this problem occurs with any Kokkos program having a single invocation of a Kokkos::parallel_for() - the tool-induced fence function gets called from the kokkosp_begin_parallel_for() function in the Kokkos Tools sampler. (I could make that happen by setting the number of outer iterations in stream to 1 and using just one kernel in stream, e.g., "add", but I will look at making a standalone reproducer).

@vlkale vlkale added feature Needed feature but software still is correct on its own and removed enhancement labels Jan 19, 2024
@masterleinad
Copy link
Contributor

What does the backtrace look like?

@vlkale
Copy link
Contributor Author

vlkale commented Jan 22, 2024

@masterleinad Yes, getting this soon - thanks!

@vlkale
Copy link
Contributor Author

vlkale commented Jan 30, 2024

@masterleinad Here is the backtrace of the stream benchmark's copy' parallel_for with gdb on Perlmutter, and I got this by taking the exit(1) in the case of a Null pointer (i.e., just letting the code run to failure of a seg. fault).

The run is with just the Kokkos serial backend with 200 iterations of the stream benchmark using just the copy, i.e., just one Kokkos::parallel_for iterating 200 times. I am using the Kokkos develop branch. This is with the use-probability-sampling branch on my fork (github.com/vlkale/kokkos-tools/tree/use-probability-sampling), or Kokkos Tools PR #181.

From the below, I cannot quite see in depth what the problem could be but one can see the backtrace goes through Kokkos core. Right now having looked at this, I think the problem is in my tool connector and the way I have initialized the pointers and not in Kokkos core. I have reviewed literature/basics on C++ function pointers and I think David Poliakoff's code for the tool_invoked fence in Kokkos_Profiling.hpp and Kokkos_Profiling_C_Interface.h is right (though maybe a bit more documentation can help).

Before I show the run with gdb output, here are the modules loaded on Perlmutter.

vkale3@perlmutter:login21:~/kks/benchmarks/stream> module list
Currently Loaded Modules:
  1) craype-x86-milan                        5) gcc-native/12.3          9) craype-accel-nvidia80               13) cray-dsmml/0.2.2
  2) libfabric/1.15.2.0                      6) perftools-base/23.12.0  10) gpu/1.0                             14) cray-mpich/8.1.28   (mpi)
  3) craype-network-ofi                      7) cpe/23.12               11) cmake/3.22.0          (buildtools)  15) cray-libsci/23.12.5 (math)
  4) xpmem/2.6.2-2.5_2.33__gd067c3f.shasta   8) cudatoolkit/12.2        12) craype/2.7.30         (c)           16) PrgEnv-gnu/8.5.0    (cpe)
  Where:
   mpi:         MPI Providers
   cpe:         Cray Programming Environment Modules
   math:        Mathematical libraries
   buildtools:  Software Build Tools
   c:           Compiler

vkale3@perlmutter:login21:~/kks/benchmarks/stream> gdb ./stream.exe
GNU gdb (GDB; SUSE Linux Enterprise 15) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.opensuse.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./stream.exe...
(gdb) run
Starting program: /global/u1/v/vkale3/kks/benchmarks/stream/stream.exe
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.31-150300.58.1.x86_64
-------------------------------------------------------------
Kokkos STREAM Benchmark
-------------------------------------------------------------
KokkosP: Next library to call: /global/homes/v/vkale3/kto-inst-dir2/lib64/libkp_kernel_logger.so
KokkosP: Loading child library ..
KokkosP: Kernel Logger Library Initialized (sequence is 1, version: 20211015)
KokkosP: Function Status:
KokkosP: begin-parallel-for:      yes
KokkosP: begin-parallel-scan:     yes
KokkosP: begin-parallel-reduce:   yes
KokkosP: end-parallel-for:        yes
KokkosP: end-parallel-scan:       no
KokkosP: end-parallel-reduce:     yes
KokkosP: Sampling rate set to: 1
KokkosP: Sampling skip rate is set to: 2
KokkosP: Sampling probability is set to 40.000000
KokkosP: Seeding random number generator using clock for random sampling.
KokkosP: You set both the probability and skip rate for the sampler. Only random sampling will be done, using the probabability you set; The skip rate you set will be ignored.
KokkosP: tpi_funcs set to 0x440c40
Reports fastest timing per kernel
Creating Views...
Memory Sizes:
- Array Size:    1000
- Per Array:             0.01 MB
- Total:                 0.02 MB
Benchmark kernels will be performed for 20 iterations.
-------------------------------------------------------------
KokkosP: sample 3 calling child-begin function...
KokkosP: FATAL: Kokkos Tools Programming Interface's tool-invoked Fence is NULL!
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
Missing separate debuginfos, use: zypper install libgcc_s1-debuginfo-12.3.0+git1204-150000.1.16.1.x86_64 libstdc++6-debuginfo-12.3.0+git1204-150000.1.16.1.x86_64
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff79fc247 in KokkosTools::Sampler::kokkosp_begin_parallel_for (name=0x49da70 "Kokkos::View::initialization [c] via memset",
    devID=1, kID=0x7fffffff7aa8) at /global/u1/v/vkale3/kto-dev-vlk/common/kokkos-sampler/kp_sampler_skip.cpp:318
#2  0x000000000040fa26 in Kokkos::Tools::Experimental::invoke_kokkosp_callback<void (*)(char const*, unsigned int, unsigned long*), char const*, unsigned int const&, unsigned long*&> (callback=@0x441ea0: 0x7ffff79fc250 <kokkosp_begin_parallel_for(char const*, uint32_t, uint64_t*)>,
    may_require_global_fencing=Kokkos::Tools::Experimental::MayRequireGlobalFencing::Yes)
    at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_Profiling.cpp:295
#3  Kokkos::Tools::beginParallelFor (kernelPrefix=..., devID=devID@entry=1, kernelID=kernelID@entry=0x7fffffff7aa8)
    at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_Profiling.cpp:306
#4  0x000000000040fa8c in Kokkos::Profiling::beginParallelFor (kernelPrefix=..., devID=devID@entry=1, kernelID=kernelID@entry=0x7fffffff7aa8)
    at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_Profiling.cpp:979
#5  0x000000000040dba5 in Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double, true>::construct_shared_allocation<double> (this=this@entry=0x497560) at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_ViewMapping.hpp:3053
#6  0x000000000040e28f in Kokkos::Impl::ViewMapping<Kokkos::ViewTraits<double*, Kokkos::MemoryTraits<8u> >, void>::allocate_shared<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::HostSpace, Kokkos::Serial> (this=this@entry=0x7fffffff7ed8,
    arg_prop=..., arg_layout=..., execution_space_specified=execution_space_specified@entry=false)
    at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_ViewMapping.hpp:3454
#7  0x000000000040e528 in Kokkos::View<double*, Kokkos::MemoryTraits<8u> >::View<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, std::enable_if<!Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::has_pointer, Kokkos::LayoutRight>::type const&) (this=0x7fffffff7ed0, arg_prop=..., arg_layout=...)
    at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/Kokkos_View.hpp:1464
#8  0x00000000004077f7 in Kokkos::View<double*, Kokkos::MemoryTraits<8u> >::View<char [2]> (arg_N7=18446744073709551615,
--Type <RET> for more, q to quit, c to continue without paging--
    arg_N6=18446744073709551615, arg_N5=18446744073709551615, arg_N4=18446744073709551615, arg_N3=18446744073709551615,
    arg_N2=18446744073709551615, arg_N1=18446744073709551615, arg_N0=1000, arg_label=..., this=0x7fffffff7ed0)
    at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/Kokkos_View.hpp:1580
#9  run_benchmark () at stream-kokkos.cpp:151
#10 0x000000000040879b in main (argc=<optimized out>, argv=0x7fffffff8068) at stream-kokkos.cpp:244
(gdb)
(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff79fc247 in KokkosTools::Sampler::kokkosp_begin_parallel_for (name=0x49da70 "Kokkos::View::initialization [c] via memset", devID=1, kID=0x7fffffff7aa8)
    at /global/u1/v/vkale3/kto-dev-vlk/common/kokkos-sampler/kp_sampler_skip.cpp:318
#2  0x000000000040fa26 in Kokkos::Tools::Experimental::invoke_kokkosp_callback<void (*)(char const*, unsigned int, unsigned long*), char const*, unsigned int const&, unsigned long*&> (callback=@0x441ea0: 0x7ffff79fc250 <kokkosp_begin_parallel_for(char const*, uint32_t, uint64_t*)>,
    may_require_global_fencing=Kokkos::Tools::Experimental::MayRequireGlobalFencing::Yes)
    at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_Profiling.cpp:295
#3  Kokkos::Tools::beginParallelFor (kernelPrefix=..., devID=devID@entry=1, kernelID=kernelID@entry=0x7fffffff7aa8)
    at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_Profiling.cpp:306
#4  0x000000000040fa8c in Kokkos::Profiling::beginParallelFor (kernelPrefix=..., devID=devID@entry=1, kernelID=kernelID@entry=0x7fffffff7aa8)
    at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_Profiling.cpp:979
#5  0x000000000040dba5 in Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double, true>::construct_shared_allocation<double> (
    this=this@entry=0x497560) at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_ViewMapping.hpp:3053
#6  0x000000000040e28f in Kokkos::Impl::ViewMapping<Kokkos::ViewTraits<double*, Kokkos::MemoryTraits<8u> >, void>::allocate_shared<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::HostSpace, Kokkos::Serial> (this=this@entry=0x7fffffff7ed8, arg_prop=..., arg_layout=...,
    execution_space_specified=execution_space_specified@entry=false) at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/impl/Kokkos_ViewMapping.hpp:3454
#7  0x000000000040e528 in Kokkos::View<double*, Kokkos::MemoryTraits<8u> >::View<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, std::enable_if<!Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::has_pointer, Kokkos::LayoutRight>::type const&) (this=0x7fffffff7ed0, arg_prop=...,
    arg_layout=...) at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/Kokkos_View.hpp:1464
#8  0x00000000004077f7 in Kokkos::View<double*, Kokkos::MemoryTraits<8u> >::View<char [2]> (arg_N7=18446744073709551615, arg_N6=18446744073709551615,
    arg_N5=18446744073709551615, arg_N4=18446744073709551615, arg_N3=18446744073709551615, arg_N2=18446744073709551615, arg_N1=18446744073709551615, arg_N0=1000,
    arg_label=..., this=0x7fffffff7ed0) at /global/u1/v/vkale3/kks/benchmarks/stream/../../core/src/Kokkos_View.hpp:1580
#9  run_benchmark () at stream-kokkos.cpp:151
#10 0x000000000040879b in main (argc=<optimized out>, argv=0x7fffffff8068) at stream-kokkos.cpp:244
[4:26](https://kokkosteam.slack.com/archives/D03V2AM7EQ1/p1706488018097429)

@vlkale vlkale self-assigned this Feb 16, 2024
@vlkale
Copy link
Contributor Author

vlkale commented Feb 16, 2024

This got resolved today with @crtrott.

The problem is not in Kokkos core, in particular, Kokkos_Profiling.cpp as suspected in this Issue. The problem is from the feature enhancement in the sampling tool connector PR #213 requiring the use of the tool programming interface, specifically kp_sampler_skip.cpp and kp_core.hpp.

As a separate note: it could be useful to have better documentation in Kokkos Tools to explain to a Kokkos Tools tool connector developer that the type Kokkos_Tools_ToolsProgrammingInterface is a struct and not a pointer, when used in the kokkosp_provide_tool_programming_interface().

See PR #213.

@vlkale vlkale closed this as completed Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug feature Needed feature but software still is correct on its own
Projects
None yet
Development

No branches or pull requests

3 participants