Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpu/ocl: Minor generic-vendor fixes #2084

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/xpu/ocl/engine_factory.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ class engine_factory_t : public impl::engine_factory_t {
std::vector<cl_device_id> ocl_devices;
status_t status
= xpu::ocl::get_devices(&ocl_devices, CL_DEVICE_TYPE_GPU);
if (status != status::success) return status;
if (status != status::success) return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed that we should not be returning status here, it is not a device count. On the other hand, I think it is important to maintain the distinction between expected errors and unexpected errors (such as the CL_PLATFORM_NOT_FOUND_KHR result from the clGetPlatformIDs call). What if we keep the current get_devices behavior, but use VERROR to log the unexpected error case before returning 0.

return ocl_devices.size();
}

Expand Down
20 changes: 8 additions & 12 deletions src/xpu/ocl/utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -190,21 +190,18 @@ static bool is_intel_platform(cl_platform_id platform) {
}

status_t get_devices(std::vector<cl_device_id> *devices,
cl_device_type device_type, cl_uint vendor_id /* = 0x8086 */) {
cl_device_type device_type) {
cl_uint num_platforms = 0;

cl_int err = clGetPlatformIDs(0, nullptr, &num_platforms);
// No platforms - a valid scenario
if (err == CL_PLATFORM_NOT_FOUND_KHR) return status::success;
if (err == CL_PLATFORM_NOT_FOUND_KHR) return status::runtime_error;

OCL_CHECK(err);

std::vector<cl_platform_id> platforms(num_platforms);
OCL_CHECK(clGetPlatformIDs(num_platforms, &platforms[0], nullptr));

for (size_t i = 0; i < platforms.size(); ++i) {
if (!is_intel_platform(platforms[i])) continue;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that getting the intel platform checks out of xpu::ocl makes sense, but we still need some filtering to prevent non-intel platforms from dispatching into the code in src/gpu/intel. Because of this, it seems like we need to add a check for the intel platform before the call to gpu::intel::ocl::engine_create.

In addition, I don't think we can enable the non-intel runtimes to dispatch into this code either as it is not currently supported and there is no testing infrastructure to ensure it works. We will need an RFC on how multiple runtimes should be supported before that can be considered. I get that this functionality is useful for your exploration of oneDNN though, maybe we can add a development variable guard in gpu::intel::ocl::engine_create (similar to here, and documented here) for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This change breaks the current contract that get_devices returns only intel devices and therefore it will affect all Intel-specific GPU code that relies on that.
  • I don't see a problem with asking get_devices to filter out devices and platforms as it has to be done somewhere anyway. The is_intel_platform function merely checks the name of the platform so it doesn't add any dependency on Intel-specific API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjoursler:

we still need some filtering to prevent non-intel platforms from dispatching into the code in src/gpu/intel [...] check for the intel platform before the call to gpu::intel::ocl::engine_create.

I like that. Would it make sense to then remove the consistency check in check_device? That's currently the thing preventing us from binding to rusticl on otherwise-Intel hardware, so it needs to be kept somewhere, but I think if we assert is_intel_platform before engine_create we wouldn't need to verify it later.

In addition, I don't think we can enable the non-intel runtimes to dispatch into this code either as it is not currently supported and there is no testing infrastructure to ensure it works.

As of 7cfc834 (a few days old, just happens to be checked out), building the OCL engine for the GENERIC vendor does not produce a working libdnnl.so:

[23/359] Linking CXX executable tests/gtests/test_binary
FAILED: tests/gtests/test_binary 
: && /usr/bin/c++ -fopenmp -fvisibility-inlines-hidden  -Wall -Wno-unknown-pragmas -fvisibility=internal -Wno-ignored-attributes -fPIC -Wformat -Wformat-security -fstack-protector-strong     -Wno-strict-overflow -Wno-maybe-uninitialized -O2 -g -DNDEBUG -pie -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now tests/gtests/CMakeFiles/test_binary.dir/main.cpp.o tests/gtests/CMakeFiles/test_binary.dir/__/test_thread.cpp.o tests/gtests/CMakeFiles/test_binary.dir/test_binary.cpp.o -o tests/gtests/test_binary  -Wl,-rpath,/var/home/ajax/git/oneDNN/build/src  src/libdnnl.so.3.6  tests/gtests/gtest/libdnnl_gtest.a  /usr/lib64/libOpenCL.so  -ldl  -lpthread && :
/usr/bin/ld: tests/gtests/CMakeFiles/test_binary.dir/test_binary.cpp.o: in function `dnnl::ocl_interop::make_memory(dnnl::memory::desc const&, dnnl::engine const&, dnnl::ocl_interop::memory_kind, void*)':
/var/home/ajax/git/oneDNN/include/oneapi/dnnl/dnnl_ocl.hpp:271:(.text._ZN4test11make_memoryERKN4dnnl6memory4descERKNS0_6engineE[_ZN4test11make_memoryERKN4dnnl6memory4descERKNS0_6engineE]+0x6e): undefined reference to `dnnl_ocl_interop_memory_create'
/usr/bin/ld: src/libdnnl.so.3.6: undefined reference to `dnnl::impl::xpu::ocl::get_devices(std::vector<_cl_device_id*, std::allocator<_cl_device_id*> >*, unsigned long, unsigned int)'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

Even if it would link, the engine factory would only call the rest of the engine ctor if we were built for INTEL, and GENERIC will just throw a runtime error. All of which to say: I agree it ought to be tested once it's enabled, but that configuration is still pretty unreachable.

@densamoilov:

This change breaks the current contract that get_devices returns only intel devices and therefore it will affect all Intel-specific GPU code that relies on that.

Meh? I don't think that's an especially good contract, certainly not in code that doesn't have intel in the pathname, and (per the above) it can't matter yet as the GENERIC configuration does not reach clCreateContext. In the supported INTEL configuration we will still only initialize on cl_devices from the Intel platform, so the only visible change is that the device array would have more things in it. But (afaict) at runtime we only really operate in terms of engines not devices. So it's hard for me to see that there is any reliance on that invariant.

I don't see a problem with asking get_devices to filter out devices and platforms as it has to be done somewhere anyway. The is_intel_platform function merely checks the name of the platform so it doesn't add any dependency on Intel-specific API.

If the list of devices needs to be filtered then the place to do it is in the place that wants the filtered view. If the GENERIC configuration can be made to work, I could easily imagine something like rusticl for local work generation plus pocld to distribute long-lived work units. That means I will want a list of devices from more than one OpenCL platform, from all of them in fact. And unfortunately the "device index" needs to be unique across the engine kind which means if I want to talk to two OpenCL platforms then I need to assign globally unique indexes to their devices anyway. I may as well just collect them all here, and (per the above) apply the platform filter before finishing device init.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's an especially good contract

My point is, the contract exists and therefore if you want to change it you will have to adjust the code that relies on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is, the contract exists and therefore if you want to change it you will have to adjust the code that relies on it.

For an example, consider this location.

Copy link
Contributor

@rjoursler rjoursler Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that. Would it make sense to then remove the consistency check in check_device?

That check looks like it doesn't belongs in xpu::ocl anyway. It should likely be moved to intel::ocl::ocl_gpu_engine_t::init() and use VCONDCHECK to log the creation failure. With this design, there is no need to guard the call to engine_create either, as creation failure will be enforced by the intel::ocl_gpu_engine_t structure.

cl_uint num_devices = 0;
cl_int err = clGetDeviceIDs(
platforms[i], device_type, 0, nullptr, &num_devices);
Expand All @@ -219,17 +216,16 @@ status_t get_devices(std::vector<cl_device_id> *devices,
OCL_CHECK(clGetDeviceIDs(platforms[i], device_type, num_devices,
&plat_devices[0], nullptr));

// Use the devices for the requested vendor only.
for (size_t j = 0; j < plat_devices.size(); ++j) {
cl_uint v_id;
OCL_CHECK(clGetDeviceInfo(plat_devices[j], CL_DEVICE_VENDOR_ID,
sizeof(cl_uint), &v_id, nullptr));
if (v_id == vendor_id) { devices->push_back(plat_devices[j]); }
devices->push_back(plat_devices[j]);
}
}
}
// No devices found but still return success
return status::success;

if (devices->size() != 0)
return status::success;

return status::runtime_error;
}

status_t get_devices(std::vector<cl_device_id> *devices,
Expand Down
2 changes: 1 addition & 1 deletion src/xpu/ocl/utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ struct ext_func_t {
std::string get_kernel_name(cl_kernel kernel);

status_t get_devices(std::vector<cl_device_id> *devices,
cl_device_type device_type, cl_uint vendor_id = 0x8086);
cl_device_type device_type);

status_t get_devices(std::vector<cl_device_id> *devices,
std::vector<wrapper_t<cl_device_id>> *sub_devices,
Expand Down
Loading