Copyright © 2023-2023 Intel Corporation. All rights reserved.
Khronos® is a registered trademark and SYCL™ and SPIR™ are trademarks of The Khronos Group Inc. OpenCL™ is a trademark of Apple Inc. used by permission by Khronos.
To report problems with this extension, please open a new issue at:
This extension is written against the SYCL 2020 revision 7 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision.
This is a proposed extension specification, intended to gather community feedback. Interfaces defined in this specification may not be implemented yet or may be in a preliminary state. The specification itself may also change in incompatible ways before it is finalized. Shipping software products should not rely on APIs defined in this specification.
SYCL 2020 provides multiple ways to enqueue work to a device. In some cases, a
single function name is used to enqueue kernels with very different use-cases
and execution models (e.g., parallel_for(range)
and
parallel_for(nd_range)
). In almost all cases, the functions are available in
multiple places (e.g., queue::parallel_for
and handler::parallel_for
).
In all cases, these functions return an event
object by default, which has
been shown to introduce undesirable performance overhead.
Additionally, we have received feedback from developers and implementers alike
that the number of parallel_for
overloads is confusing, and that the way
reductions in particular are specified (as a parameter pack containing both
sycl::reduction
objects and a kernel lambda) is problematic.
This extension addresses these issues by:
-
Using different function names for different use-cases.
-
Using free-functions instead of member functions.
-
Requiring developers to opt-in to the creation of
event
objects. -
Bundling everything related to a kernel’s launch configuration (i.e., its range, any launch properties) into a single object.
-
Moving the reductions parameter pack after the kernel.
This extension makes SYCL simpler and easier to document. It is also expected
to improve the performance of many SYCL applications, where event
objects are
not required to describe application behavior.
All functions proposed in this extension accept as their first argument an object that represents where a command should be submitted, allowing the new functions to be used either at command-group scope or as a replacement for existing queue shortcuts. A future version of this extension may adjust this overload set to include functions compatible with future C++ concepts (e.g, by accepting a scheduler and returning a sender).
The example below demonstrates that the syntax proposed here requires only minor changes to existing applications, while retaining their structure.
q.submit([&](sycl::handler& h) {
sycl::accessor result { buf, h, sycl::write_only, sycl::no_init };
h.parallel_for(1024, [=](sycl::id<1> idx) {
result[idx] = idx;
});
});
float* output = sycl::malloc_shared<int>(1, q);
*output = 0;
std::vector<sycl::event> depEvents = /* some dependencies */;
sycl::event e = q.parallel_for(sycl::nd_range<1>{1024, 16}, depEvents,
sycl::reduction(output, sycl::plus<>()),
[=](sycl::nd_item<1> it, auto& sum) {
sum += it.get_global_id();
});
e.wait();
sycl::free(output, q);
using syclex = sycl::ext::oneapi::experimental;
syclex::submit(q, [&](sycl::handler& h) {
sycl::accessor result { buf, h, sycl::write_only, sycl::no_init };
syclex::parallel_for(h, 1024, [=](sycl::id<1> idx) {
result[idx] = idx;
});
});
float* output = sycl::malloc_shared<int>(1, q);
*output = 0;
std::vector<sycl::event> depEvents = /* some dependencies */;
sycl::event e = syclex::submit_with_event(q, [&](sycl::handler& h) {
h.depends_on(depEvents);
syclex::nd_launch(h, sycl::nd_range<1>{1024, 16},
[=](sycl::nd_item<1> it, auto& sum) {
sum += it.get_global_id();
},
sycl::reduction(output, sycl::plus<>())
});
e.wait();
sycl::free(output, q);
This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro SYCL_EXT_ONEAPI_ENQUEUE_FUNCTIONS
to one of the values defined in the
table below. Applications can test for the existence of this macro to
determine if the implementation supports this feature, or applications can test
the macro’s value to determine which of the extension’s features the
implementation supports.
Value | Description |
---|---|
1 |
The APIs of this experimental extension are not versioned, so the feature-test macro always has this value. |
A launch configuration object of type launch_config
is used to bundle
together all components of a kernel’s launch configuration, including:
-
The range of execution.
-
Any compile-time properties.
Any compile-time properties passed as part of a launch_config
only affect the
way in which the kernel is launched. They cannot be used to define information
about the kernel itself. This extension does not define any properties for
launch_config
, but other extensions are expected to define such properties.
[Note: The properties defined in the
sycl_ext_oneapi_kernel_properties
extension (e.g., work_group_size
) cannot be used via launch_config
. In
order to use these properties with a kernel, the kernel must be a named
functioned object which exposes the properties via
get(sycl::ext::oneapi::experimental::properties_tag)
as described in that
extension. — end note]
namespace sycl::ext::oneapi::experimental {
// Available only when Range is sycl::range or sycl::nd_range
template <typename Range, typename Properties = empty_properties_t>
class launch_config {
public:
launch_config(Range, Properties = {});
};
}
launch_config(Range, Properties);
Constraints: Available only if Range
is a sycl::range
or
sycl::nd_range
, and Properties
is a compile-time property list.
Effects: Constructs a launch_config
from the specified range and
properties.
When specifying event dependencies or requesting the creation of events, commands must be wrapped in a command-group.
Effects: Submits a command-group function object (as defined by the SYCL
specification) to the |
|
Effects: Submits a command-group function object (as defined by the SYCL
specification) to the Returns: A |
Effects: Enqueues a kernel function to the |
|
Effects: Enqueues a kernel object to the |
Constraints: The parameter pack consists of 0 or more objects created by the
Effects: Enqueues a kernel function to the |
|
Constraints: The parameter pack consists of 0 or more objects created by the
Effects: Enqueues a kernel function to the |
|
Effects: Enqueues a kernel object to the |
|
Effects: Enqueues a kernel object to the |
Constraints: The parameter pack consists of 0 or more objects created by the
Effects: Enqueues a kernel function to the |
|
Constraints: The parameter pack consists of 0 or more objects created by the
Effects: Enqueues a kernel function to the |
|
Effects: Enqueues a kernel object to the |
|
Effects: Enqueues a kernel object to the |
Effects: Enqueues a |
|
Effects: Enqueues a |
|
Effects: Enqueues a |
|
Effects: Enqueues a |
|
Effects: Enqueues a |
|
Effects: Enqueues a |
The functions in this section are only available if the sycl_ext_oneapi_enqueue_barrier extension is supported.
Effects: Enqueues a command barrier to the |
|
Effects: Enqueues a partial command barrier to the [Note: If |
-
What should
submit_with_event
be called?UNRESOLVED:
submit_with_event
is descriptive but verbose. Synonyms forsubmit
likeenqueue
do not obviously mean "return an event".record
may be confused with the recording functionality associated with SYCL graphs. -
What about
accessor
overloads andupdate_host
?UNRESOLVED: Supporting
accessor
overloads with this new approach is possible, but additional design work is required to understand how to handle placeholder accessors. Whetherupdate_host
should be exposed via this new free-function interface is an open question.