-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Implement work group memory extension #15178
Merged
Merged
Changes from all commits
Commits
Show all changes
97 commits
Select commit
Hold shift + click to select a range
652caa8
Preliminary implementation of work_group_memory extension
lbushi25 76daf77
Preliminary implementation of work_group_memory extension
lbushi25 21e082b
Implement work_group_memory extension
lbushi25 025cbc4
Implement work_group_memory extension
lbushi25 b94f7c9
Implement work group memory
lbushi25 0d6d694
Remove debug dumps
lbushi25 448071f
Merge branch 'intel:sycl' into work_group_memoy_new
lbushi25 9f2973a
Update work_group_memory.hpp
lbushi25 852315f
Remove include of deleted header file
lbushi25 4234022
Fix SPIRV compilation errors
lbushi25 ae5eb7e
Remove accidental change
lbushi25 8ce0280
Formatting changes
lbushi25 4ee31a5
Formatting changes
lbushi25 7b1b90b
Put the work group memory doc to supported
lbushi25 cf7476e
More formatting changes
lbushi25 50c0954
Delete sycl/include/sycl/ext/oneapi/experimental/test.cpp
lbushi25 44811b8
Yet more formatting changes
lbushi25 ad1046f
Merge branch 'work_group_memoy_new' of https://github.com/lbushi25/ll…
lbushi25 d343a2e
Fix warnings on Linux
lbushi25 3f1bc30
Remove unnecessary forward declaration from handler.hpp
lbushi25 103e233
Remove rvalue references in favor of const lvalue references
lbushi25 bfa5830
Fix syntax errors
lbushi25 2031478
Fix syntax errors
lbushi25 76f0acc
Don't explicitly make the work_group_memory class device-copyable as …
lbushi25 e0ad435
Don't explicitly make the work_group_memory class device-copyable as …
lbushi25 3513251
Remove some more unnecessary code
lbushi25 2cec997
Update work_group_memory.hpp
lbushi25 4c8b196
Update work_group_memory.hpp
lbushi25 a0b70e2
Formatting
lbushi25 ed8f125
Move doc to experimental folder
lbushi25 9460876
Update status section in doc
lbushi25 ac7130a
Merge branch 'work_group_memoy_new' of https://github.com/lbushi25/ll…
lbushi25 e2889b3
Final fixes
lbushi25 d6c78b9
Remove unnecessary include
lbushi25 8f7a07b
Add initial tests for work_group_memory extension
lbushi25 ae59899
Add E2E tests for work group memory
lbushi25 8cff603
Fix formatting
lbushi25 3ceead1
Resolve merge conflict
lbushi25 3228aeb
Revamp tests for work group memory extension
lbushi25 0e95ee5
Remove sanity test
lbushi25 52f13f0
Move extension doc to proposed
lbushi25 71d1013
Restore proposed status of work group memory doc
lbushi25 d48bc42
Fix unusd variable warning
lbushi25 f6515bc
Reduce test size to make sure UR does not run out or resources
lbushi25 3e4c73c
Replace sycl.hpp with core.hpp in the includes of E2E test
lbushi25 c84229e
Remove sycl.hpp include from tests
lbushi25 d2fddd8
Add support for unbounded arrays
lbushi25 0f677c2
Fix compilation errors
lbushi25 6ef823e
Improve swap test
lbushi25 6dc262a
Merge branch 'work_group_memoy_new' of https://github.com/lbushi25/ll…
lbushi25 4de6d50
Refactor CodeGenTypes.cpp changes
lbushi25 5653f04
Refactor CodeGenTypes.cpp changes
lbushi25 40eb63e
translate unbounded arrays to 1-sized arrays in LLVM IR in device com…
lbushi25 f6a0df7
Remove trailing spaces
lbushi25 026501c
Add unbounded array support by modifying LLVM IR -> SPIRV type lowering
lbushi25 2ce21b3
Revert CodeGenTypes.cpp changes
lbushi25 a9b2875
Fix merge conflicts
lbushi25 3821df4
Merge branch 'intel:sycl' into work_group_memoy_new
lbushi25 d73b0b1
Update SPIRVWriter.cpp
lbushi25 c1087ad
Merge branch 'work_group_memoy_new' of https://github.com/lbushi25/ll…
lbushi25 31481b8
Revert SPIRV translator changes
lbushi25 396169f
Revert SPIRV translator changes
lbushi25 dc37b2c
Merge branch 'intel:sycl' into work_group_memoy_new
lbushi25 236139f
Revert SPIRV translator changes
lbushi25 2beda8e
Stash changes
lbushi25 dbafe31
Add inital free function kernel support for work group memory
lbushi25 1b968df
merge latest upstream changes
lbushi25 e6b66c3
Apply suggestions
lbushi25 84ef6a8
Merge branch 'intel:sycl' into work_group_memoy_new
lbushi25 f24af09
Merge branch 'intel:sycl' into work_group_memoy_new
lbushi25 7dfa80b
Address reviews
lbushi25 3957cb5
Address reviews
lbushi25 91820d8
Address reviews
lbushi25 34bc23d
Update WorkGroupMemoryBackendArgument.cpp
lbushi25 3acf835
Update WorkGroupMemoryBackendArgument.cpp
lbushi25 604c640
Fix unit test implementation for backend kernel argument
lbushi25 5510208
Merge branch 'work_group_memoy_new' of https://github.com/lbushi25/ll…
lbushi25 d9418f9
Merge branch 'sycl' into work_group_memoy_new
lbushi25 e90a3b7
Add frontend tests
lbushi25 3b9a55a
Merge branch 'work_group_memoy_new' of https://github.com/lbushi25/ll…
lbushi25 b9ed6f4
Update work_group_memory.cpp
lbushi25 af08c19
Fix segmentation fault
lbushi25 b2a97a2
Merge branch 'work_group_memoy_new' of https://github.com/lbushi25/ll…
lbushi25 77a6de1
Move implementation details away from handler to handler_impl
lbushi25 3cb0ba4
Fix ABI breakage
lbushi25 1783f75
Update handler.hpp
lbushi25 5a6085f
Merge branch 'intel:sycl' into work_group_memoy_new
lbushi25 6affbc3
Add missing symbols to dumps
lbushi25 ed3c60f
Merge branch 'sycl' into work_group_memoy_new
lbushi25 fd89473
Fix compilation errors from changes in handler.hpp in another commit
lbushi25 24f87b0
Merge branch 'intel:sycl' into work_group_memoy_new
lbushi25 cba30a3
Merge branch 'intel:sycl' into work_group_memoy_new
lbushi25 3b10242
Refactor handler.hpp
lbushi25 f38f400
Revert "Refactor handler.hpp"
lbushi25 38a8d79
Fix typo in exception message
lbushi25 4df2f48
Improve frontend tests
lbushi25 9a7e3f1
Improve frontend tests
lbushi25 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
// RUN: %clang_cc1 -fsycl-is-device -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm %s -o %t.ll | ||
// RUN: FileCheck < %t.ll %s --check-prefix CHECK-IR | ||
// RUN: %clang_cc1 -fsycl-is-device -triple spir64-unknown-unknown -fsycl-int-header=%t.h %s | ||
// RUN: FileCheck < %t.h %s --check-prefix CHECK-INT-HEADER | ||
// | ||
// Tests for work_group_memory kernel parameter using the dummy implementation in Inputs/sycl.hpp. | ||
// The first two RUN commands verify that the init call is generated with the correct arguments in LLVM IR | ||
// and the second two RUN commands verify the contents of the integration header produced by the frontend. | ||
// | ||
// CHECK-IR: define dso_local spir_kernel void @ | ||
// CHECK-IR-SAME: ptr addrspace(3) noundef align 4 [[PTR:%[a-zA-Z0-9_]+]] | ||
// | ||
// CHECK-IR: [[PTR]].addr = alloca ptr addrspace(3), align 8 | ||
// CHECK-IR: [[PTR]].addr.ascast = addrspacecast ptr [[PTR]].addr to ptr addrspace(4) | ||
// CHECK-IR: store ptr addrspace(3) [[PTR]], ptr addrspace(4) [[PTR]].addr.ascast, align 8 | ||
// CHECK-IR: [[PTR_LOAD:%[a-zA-Z0-9_]+]] = load ptr addrspace(3), ptr addrspace(4) [[PTR]].addr.ascast, align 8 | ||
// | ||
// CHECK-IR: call spir_func void @{{.*}}__init{{.*}}(ptr addrspace(4) noundef align 8 dereferenceable_or_null(8) %{{[a-zA-Z0-9_]+}}, ptr addrspace(3) noundef [[PTR_LOAD]]) | ||
// | ||
// CHECK-INT-HEADER: const kernel_param_desc_t kernel_signatures[] = { | ||
// CHECK-INT-HEADER-NEXT: //--- _ZTSZZ4mainENKUlRN4sycl3_V17handlerEE_clES2_EUlNS0_4itemILi1EEEE_ | ||
// CHECK-INT-HEADER-NEXT: { kernel_param_kind_t::kind_work_group_memory, {{[4,8]}}, 0 }, | ||
// CHECK-INT-HEADER-EMPTY: | ||
// CHECK-INT-HEADER-NEXT: { kernel_param_kind_t::kind_invalid, -987654321, -987654321 }, | ||
// CHECK-INT-HEADER-NEXT: }; | ||
|
||
#include "Inputs/sycl.hpp" | ||
|
||
int main() { | ||
sycl::queue Q; | ||
Q.submit([&](sycl::handler &CGH) { | ||
sycl::work_group_memory<int> mem; | ||
sycl::range<1> ndr; | ||
CGH.parallel_for(ndr, [=](sycl::item<1> it) { int *ptr = &mem; }); | ||
}); | ||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
84 changes: 84 additions & 0 deletions
84
sycl/include/sycl/ext/oneapi/experimental/work_group_memory.hpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
//===-------------------- work_group_memory.hpp ---------------------------===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#pragma once | ||
|
||
#include <type_traits> | ||
|
||
namespace sycl { | ||
inline namespace _V1 { | ||
namespace detail { | ||
template <typename T> struct is_unbounded_array : std::false_type {}; | ||
|
||
template <typename T> struct is_unbounded_array<T[]> : std::true_type {}; | ||
|
||
template <typename T> | ||
inline constexpr bool is_unbounded_array_v = is_unbounded_array<T>::value; | ||
|
||
class work_group_memory_impl { | ||
public: | ||
work_group_memory_impl() : buffer_size{0} {} | ||
work_group_memory_impl(const work_group_memory_impl &rhs) = default; | ||
work_group_memory_impl & | ||
operator=(const work_group_memory_impl &rhs) = default; | ||
work_group_memory_impl(size_t buffer_size) : buffer_size{buffer_size} {} | ||
|
||
private: | ||
size_t buffer_size; | ||
friend class sycl::handler; | ||
}; | ||
|
||
} // namespace detail | ||
namespace ext::oneapi::experimental { | ||
|
||
template <typename DataT, typename PropertyListT = empty_properties_t> | ||
class __SYCL_SPECIAL_CLASS __SYCL_TYPE(work_group_memory) work_group_memory | ||
: sycl::detail::work_group_memory_impl { | ||
public: | ||
using value_type = std::remove_all_extents_t<DataT>; | ||
|
||
private: | ||
using decoratedPtr = typename sycl::detail::DecoratedType< | ||
value_type, access::address_space::local_space>::type *; | ||
|
||
public: | ||
work_group_memory() = default; | ||
work_group_memory(const work_group_memory &rhs) = default; | ||
work_group_memory &operator=(const work_group_memory &rhs) = default; | ||
template <typename T = DataT, | ||
typename = std::enable_if_t<!sycl::detail::is_unbounded_array_v<T>>> | ||
work_group_memory(handler &) | ||
: sycl::detail::work_group_memory_impl(sizeof(DataT)) {} | ||
template <typename T = DataT, | ||
typename = std::enable_if_t<sycl::detail::is_unbounded_array_v<T>>> | ||
work_group_memory(size_t num, handler &) | ||
: sycl::detail::work_group_memory_impl( | ||
num * sizeof(std::remove_extent_t<DataT>)) {} | ||
template <access::decorated IsDecorated = access::decorated::no> | ||
multi_ptr<value_type, access::address_space::local_space, IsDecorated> | ||
get_multi_ptr() const { | ||
return sycl::address_space_cast<access::address_space::local_space, | ||
IsDecorated, value_type>(ptr); | ||
} | ||
DataT *operator&() const { return reinterpret_cast<DataT *>(ptr); } | ||
operator DataT &() const { return *reinterpret_cast<DataT *>(ptr); } | ||
template <typename T = DataT, | ||
typename = std::enable_if_t<!std::is_array_v<T>>> | ||
const work_group_memory &operator=(const DataT &value) const { | ||
*ptr = value; | ||
return *this; | ||
} | ||
#ifdef __SYCL_DEVICE_ONLY__ | ||
void __init(decoratedPtr ptr) { this->ptr = ptr; } | ||
#endif | ||
private: | ||
decoratedPtr ptr; | ||
}; | ||
} // namespace ext::oneapi::experimental | ||
} // namespace _V1 | ||
} // namespace sycl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at this closely, but this looks suspicious. The
work_group_memory
object is required to correspond to just a single Level Zero kernel parameter, which is a pointer to work-group-local memory. This is the requirement specified here:https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc#dpc-guaranteed-compatibility-with-level-zero-and-opencl-backends
It looks like
work_group_memory_impl
has two member variables of typesize_t
. In addition, thework_group_memory
type has a member variable of pointer type. Does this mean that the Level Zero kernel will end up with three kernel parameters?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial testing shows that the L0 kernel only ends up with one parameter, but I will make this into a proper test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why there is only one Level Zero parameter. Is it because
wgm_size
andbuffer_size
are optimized away in the device code? I'd be concerned about this because the requirement of one Level Zero parameter holds even when optimization is disabled.When I look at the code, I don't see any uses of
buffer_size
. Is that still needed?If all we need is
wgm_size
, I wonder if it would be safer to implementwork_group_memory
such that its only data member was a union:Or, you could use
std::variant
instead of a union in a similar way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, my understanding is that the
processArg
function inhandler.cpp
file defines a mapping between the SYCL kernel parameters and the L0 kernel arguments. In the case ofwork_group_memory
, I've defined the mapping to be such that whenever the runtime sees a work group memory parameter passed to a SYCL kernel, instead map that to a local memory buffer on the underlying backend where the size of the buffer is given by thebuffer_size
member of the work group memory object. If you look at my changes inprocessArg
function, thats exactly what I'm doing. Therefore, with or without optimization, I believe there will only be one L0 kernel argument per work group memory object.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relevant lines are 793-798 in
handler.cpp
file.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbushi25 is right that this will only result in a single kernel argument, which is due to it being a "SYCL special class" so the arguments of its
__init
function is used as the arguments of the kernel and the function body is then used for constructing the object.All that said however, I also have some concerns here:
These members are public and by public inheritance they will also be public in@AlexeySachkov corrected me and he's right. Since the inheritor is awork_group_memory
. They should not be, however.class
the default inheritance is private.device
) where we have astd::shared_ptr<work_group_memory_impl>
in this case, wherework_group_memory_impl
would be moved to a source file. Then, we could have something like:Note that if we're sure we only ever need the two
size_t
, I don't think there's a big problem in having them directly in the class. Reading through it though, I don't fully understand why we need twosize_t
. If we could reduce it to one, we could do like Greg suggests further up.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I was also confused at first, so I think it worth explicitly declaring members of the base class as
protected