Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel logger upgrades for better output and better code comparisons #276

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

csiefer2
Copy link
Contributor

This PR does two things:

  1. Upgrades the output on the Kernel logger to look more like the Tpetra DeepCopyTools --- rather than output space ID's we demanagle that and output names.

  2. Allow you to use the KOKKOS_PROFILE_SUPPRESS_COUNTS variable to suppress the kernel counts in the output --- this is really useful if you're comparing two versions of the code with roughly the same flow, but some different kernel calls here and there. Specifically, code paths with and without UnifiedMemory. The call counting makes diff really hard and so turning it off can make diff easier.

@csiefer2
Copy link
Contributor Author

Yeah, yeah, I'll fix clang later. Right now I want feedback.

@vlkale
Copy link
Contributor

vlkale commented Oct 16, 2024

@csiefer2

The CI check formatting-check with clang-tidy is fine to not pass for now for the sake of a PR review. However, the CI check for simple-build is showing error: ‘int_for_synchronization_reason’ was not declared in this scope. It looks like the rest of the CI checks are also not passing, for the same reason.

Also, JM2C, but it may be good to separate this out into two PRs. I think the first change does not depend on whether the Kokkos Tools user sets KOKKOS_PROFILE_SUPPRESS_COUNTS, AFAIS. I also think the first change isn't controversial while the second change will need more time / thinking IMO: I wonder if the second change has a more efficient implementation given the conditionals in it. This is just a suggestion from my experience - feel free to keep it as is if you think it's better as one PR.

@csiefer2
Copy link
Contributor Author

csiefer2 commented Oct 16, 2024

@vlkale This is a debugging tool that fences all over the place. Host-side conditionals are basically irrelevant in terms of the performance hit the fences create.

As for the compile error, that's because the headers are no longer consistent with the headers in Kokkos. I'll add a fix to that in a second.

@csiefer2
Copy link
Contributor Author

Compiles just fine now...

Signed-off-by: Chris Siefert <[email protected]>
Comment on lines +39 to +60
if (eid.type == DeviceType::Serial)
device_label += "Serial";
else if (eid.type == DeviceType::OpenMP)
device_label += "OpenMP";
else if (eid.type == DeviceType::Cuda)
device_label += "Cuda";
else if (eid.type == DeviceType::HIP)
device_label += "HIP";
else if (eid.type == DeviceType::OpenMPTarget)
device_label += "OpenMPTarget";
else if (eid.type == DeviceType::HPX)
device_label += "HPX";
else if (eid.type == DeviceType::Threads)
device_label += "Threads";
else if (eid.type == DeviceType::SYCL)
device_label += "SYCL";
else if (eid.type == DeviceType::OpenACC)
device_label += "OpenACC";
else if (eid.type == DeviceType::Unknown)
device_label += "Unknown";
else
device_label += "Unknown to KokkosTools";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be opposed to pushing this part (or even the whole function) to Kokkos_Profiling_Interface.hpp. Getting string out of the device id doesn't seem to be specific to the kernel logger tool (also see https://github.com/kokkos/kokkos-tools/pull/265/files#diff-839f34fcb31addd9a48252bebf4d37cf674f6fcdd16539c7039813192674b9c0R165-R187).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concur. I also have to maintain this function in Trilinos, so I would love to put it in Kokkos_Profiling_Interface.hpp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @crtrott doesn't object, I'll open a Kokkos PR with that.

debugging/kernel-logger/kp_kernel_logger.cpp Outdated Show resolved Hide resolved
@vlkale
Copy link
Contributor

vlkale commented Oct 17, 2024

@vlkale This is a debugging tool that fences all over the place. Host-side conditionals are basically irrelevant in terms of the performance hit the fences create.

True about fencing (aside: maybe in the future this tool could reduce fencing more, e.g., just allowing for global fencing to be turned off, if some Kokkos users want that). Stepping back though and forgetting about efficiency, I think I may be actually misunderstanding the purpose. I thought you are handling a case where the output of kernel-logger is not different across different Kokkos backends for a particular Kokkos application like LAMMPS, but also is non-deterministic for a given Kokkos backend. Did I get that right from "comparing two versions of the code with roughly the same flow, but some different kernel calls here and there. Specifically, code paths with and without UnifiedMemory." ?

As for the compile error, that's because the headers are no longer consistent with the headers in Kokkos. I'll add a fix to that in a second.

Thanks and I realized just now that this is the issue that Kokkos Tools Github Issue #275 brings up.

@csiefer2
Copy link
Contributor Author

I think I may be actually misunderstanding the purpose. I thought you are handling a case where the output of kernel-logger is not different across different Kokkos backends for a particular Kokkos application like LAMMPS, but also is non-deterministic for a given Kokkos backend. Did I get that right from "comparing two versions of the code with roughly the same flow, but some different kernel calls here and there. Specifically, code paths with and without UnifiedMemory." ?

I'm interested using the tool to track down cases in the case code flow is deterministic, but the answer is not due to an unintended race condition.

using namespace Kokkos::Tools::Experimental;
std::string device_label("(");
ExecutionSpaceIdentifier eid = identifier_from_devid(deviceId);
if (eid.type == DeviceType::Serial)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a switch case.

if (varVal) {
std::string v = std::string(varVal);
// default to false
if (v == "1" || v == "ON" || v == "on" || v == "TRUE" || v == "true" ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time, it seems that this repo wants either 0 or 1 for the value of an environment variable.

Then use atoi.

I think it is important that all environment variables follow consistent conventions.


bool suppressCounts() {
static bool value = [](){
const char* varVal = std::getenv("KOKKOS_TOOLS_LOGGER_SUPPRESS_COUNTS");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update your PR description in which you're currently mentioning KOKKOS_PROFILE_SUPPRESS_COUNTS.

"execution identifier %llu\n",
devID, (unsigned long long)(*kID));
deviceIdToString(devID).c_str(), (unsigned long long)(output));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... What if we have several devices ? For instance, several Cuda GPUs.

Seems this change is really not going in the direction of allowing devices to be distinguished.

I would be OKish with:

Suggested change
deviceIdToString(devID).c_str(), (unsigned long long)(output));
printf(
"KokkosP: Executing parallel-scan kernel on device %d (%s) with unique "
"execution identifier %llu\n",
devID, deviceIdToString(devID).c_str(), (unsigned long long)(output));

"execution identifier %llu\n",
devID, (unsigned long long)(*kID));
deviceIdToString(devID).c_str(), (unsigned long long)(output));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deviceIdToString(devID).c_str(), (unsigned long long)(output));
deviceIdToString(devID).c_str(), suppressCounts() ? 0 : *kID);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants