Skip to content

Commit a13d57a

Browse files
authored
[SYCL][JM] Add Panther Lake (PTL) support to joint matrix query and aspect (#16885)
Also, as part of this PR, I added missing references to architectures we recently added (BMG, LNL)
1 parent f9eccc2 commit a13d57a

File tree

5 files changed

+47
-25
lines changed

5 files changed

+47
-25
lines changed

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc

+3-1
Original file line numberDiff line numberDiff line change
@@ -490,7 +490,9 @@ with the machine learning types, `T` should be the element type
490490
==== Appendix: Restrictions Per Hardware
491491
===== Intel XMX
492492
The checked APIs are currently available in devices with the architecture
493-
`architecture::intel_gpu_pvc`. The following restrictions apply to
493+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
494+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`, or
495+
`architecture::intel_gpu_ptl_u`. The following restrictions apply to
494496
these checked APIs:
495497

496498
- The `stride` parameter has the following restrictions:

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

+34-18
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,8 @@ optional kernel features as defined in section 5.7 of the core SYCL
5757
specification. Each device supports only certain values for the `M`,
5858
`N`, and `K` template parameters and only certain types for the `Ta`,
5959
`Tb`, and `Tc` template parameters. Applications can use the query API
60-
in `matrix_params` or
61-
`get_info<ext::oneapi::experimental::info::device::matrix_combinations>`
60+
in `matrix_params` or
61+
`get_info<ext::oneapi::experimental::info::device::matrix_combinations>`
6262
to determine the set of legal parameters for each device. If the
6363
application submits a kernel using an unsupported `joint_matrix` type
6464
or calls `joint_matrix_mad` with an unsupported combination, the
@@ -269,7 +269,7 @@ The two last overloads of `joint_matrix_load` take
269269
`sycl::ext::oneapi::experimental::annotated_ptr` as argument instead
270270
of `sycl::multi_ptr`. The property list associated with the
271271
`annotated_ptr` argument represents the compile-time constant
272-
properties for cache control included in the SYCL extenion
272+
properties for cache control included in the SYCL extension
273273
link:../../proposed/sycl_ext_intel_cache_controls.asciidoc[sycl_ext_intel_cache_controls]
274274
as illustrated in the example below.
275275

@@ -1109,43 +1109,49 @@ This is currently available in devices with the architecture
11091109
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
11101110
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_dg2_g10`,
11111111
`architecture::intel_gpu_dg2_g11`, `architecture::intel_gpu_dg2_g12`,
1112-
and `architecture::intel_gpu_arl_h`.
1112+
`architecture::intel_gpu_arl_h`, `architecture::intel_gpu_ptl_h`, and
1113+
`architecture::intel_gpu_ptl_u`.
11131114

11141115
[frame="none",options="header"]
11151116
|======================
11161117
| A type | B type | C type | D type | M | N | K | device
11171118
.2+| `matrix_type::uint8` .2+| `matrix_type::uint8` .2+|
11181119
`matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
11191120
|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1120-
`architecture::intel_gpu_lnl_m`
1121+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1122+
`architecture::intel_gpu_ptl_u`
11211123
|8|`architecture::intel_gpu_dg2_g10,
11221124
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
11231125
`architecture::intel_gpu_arl_h`
11241126
.2+| `matrix_type::uint8` .2+| `matrix_type::sint8` .2+|
11251127
`matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
11261128
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1127-
`architecture::intel_gpu_lnl_m`
1129+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1130+
`architecture::intel_gpu_ptl_u`
11281131
|8|`architecture::intel_gpu_dg2_g10,
11291132
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
11301133
`architecture::intel_gpu_arl_h`
11311134
.2+| `matrix_type::sint8` .2+| `matrix_type::uint8` .2+|
11321135
`matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
11331136
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1134-
`architecture::intel_gpu_lnl_m`
1137+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1138+
`architecture::intel_gpu_ptl_u`
11351139
|8|`architecture::intel_gpu_dg2_g10,
11361140
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
11371141
`architecture::intel_gpu_arl_h`
11381142
.2+| `matrix_type::sint8` .2+| `matrix_type::sint8` .2+|
11391143
`matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
11401144
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1141-
`architecture::intel_gpu_lnl_m`
1145+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1146+
`architecture::intel_gpu_ptl_u`
11421147
|8|`architecture::intel_gpu_dg2_g10,
11431148
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
11441149
`architecture::intel_gpu_arl_h`
11451150
.8+|`matrix_type::fp16` .8+| `matrix_type::fp16` .8+|
11461151
`matrix_type::fp32` .8+|`matrix_type::fp32` .1+| 16 .1+| 16 | 16
11471152
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1148-
`architecture::intel_gpu_lnl_m`
1153+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1154+
`architecture::intel_gpu_ptl_u`
11491155
.2+| 1 .2+| 64 | 16 |32
11501156
.2+| 32 .2+| 64 | 16 |32
11511157
.2+| +<=+ 8 | 16 .2+| 16
@@ -1156,24 +1162,28 @@ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
11561162
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
11571163
`matrix_type::fp16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16
11581164
.6+| `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1159-
`architecture::intel_gpu_lnl_m`
1165+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1166+
`architecture::intel_gpu_ptl_u`
11601167
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
11611168
.2+| 32 .2+| 64 | 16 | 32
11621169
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
11631170
`matrix_type::fp32` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
11641171
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1165-
`architecture::intel_gpu_lnl_m`
1172+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1173+
`architecture::intel_gpu_ptl_u`
11661174
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
11671175
.2+| 32 .2+| 64 |16 | 32
11681176
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
11691177
`matrix_type::fp16` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
11701178
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1171-
`architecture::intel_gpu_lnl_m`
1179+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1180+
`architecture::intel_gpu_ptl_u`
11721181
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 |32 .2+| 32 .2+| 64 | 16 | 32
11731182
.8+| `matrix_type::bf16` .8+| `matrix_type::bf16` .8+|
11741183
`matrix_type::fp32` .8+| `matrix_type::fp32` | 16 | 16 | 16
11751184
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1176-
`architecture::intel_gpu_lnl_m`
1185+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1186+
`architecture::intel_gpu_ptl_u`
11771187
.2+| 1 .2+| 64 | 16 | 32
11781188
.2+| 32 .2+| 64 | 16 |32
11791189
.2+| +<=+ 8 | 16 .2+| 16
@@ -1184,28 +1194,34 @@ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
11841194
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
11851195
`matrix_type::bf16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16 .6+|
11861196
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1187-
`architecture::intel_gpu_lnl_m`
1197+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1198+
`architecture::intel_gpu_ptl_u`
11881199
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
11891200
.2+| 32 .2+| 64 |16 | 32
11901201
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
11911202
`matrix_type::fp32` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
11921203
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1193-
`architecture::intel_gpu_lnl_m`
1204+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1205+
`architecture::intel_gpu_ptl_u`
11941206
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
11951207
.2+| 32 .2+| 64 |16 | 32
11961208
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
11971209
`matrix_type::bf16` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
11981210
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1199-
`architecture::intel_gpu_lnl_m`
1211+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1212+
`architecture::intel_gpu_ptl_u`
12001213
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
12011214
.2+| 32 .2+| 64 |16 | 32
12021215
| `matrix_type::tf32` | `matrix_type::tf32` |
12031216
`matrix_type::fp32` .2+| `matrix_type::fp32` | +<=+ 8 | 16 | 8 |
12041217
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1205-
`architecture::intel_gpu_lnl_m`
1218+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1219+
`architecture::intel_gpu_ptl_u`
12061220
|======================
12071221

1208-
===== Restrictions on `architecture::intel_gpu_pvc`
1222+
===== Restrictions on `architecture::intel_gpu_pvc`,
1223+
`architecture::intel_gpu_bmg_g21`, `architecture::intel_gpu_lnl_m`,
1224+
`architecture::intel_gpu_ptl_h`, and `architecture::intel_gpu_ptl_u`
12091225

12101226
- The `stride` parameter to `joint_matrix_load` and
12111227
`joint_matrix_store` has the following restrictions:

sycl/source/detail/device_impl.cpp

+1
Original file line numberDiff line numberDiff line change
@@ -684,6 +684,7 @@ bool device_impl::has(aspect Aspect) const {
684684
arch::intel_gpu_dg2_g10, arch::intel_gpu_dg2_g11,
685685
arch::intel_gpu_dg2_g12, arch::intel_gpu_bmg_g21,
686686
arch::intel_gpu_lnl_m, arch::intel_gpu_arl_h,
687+
arch::intel_gpu_ptl_h, arch::intel_gpu_ptl_u,
687688
};
688689
try {
689690
return std::any_of(

sycl/source/detail/device_info.hpp

+3-1
Original file line numberDiff line numberDiff line change
@@ -868,7 +868,9 @@ struct get_device_info_impl<
868868
};
869869
else if ((architecture::intel_gpu_pvc == DeviceArch) ||
870870
(architecture::intel_gpu_bmg_g21 == DeviceArch) ||
871-
(architecture::intel_gpu_lnl_m == DeviceArch)) {
871+
(architecture::intel_gpu_lnl_m == DeviceArch) ||
872+
(architecture::intel_gpu_ptl_h == DeviceArch) ||
873+
(architecture::intel_gpu_ptl_u == DeviceArch)) {
872874
std::vector<ext::oneapi::experimental::matrix::combination> pvc_combs = {
873875
{8, 0, 0, 0, 16, 32, matrix_type::uint8, matrix_type::uint8,
874876
matrix_type::sint32, matrix_type::sint32},

sycl/test-e2e/Basic/AMX_aspect.cpp renamed to sycl/test-e2e/matrix_aspect.cpp

+6-5
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,10 @@
22
// RUN: %{run-unfiltered-devices} %t.out
33
//
44

5-
//==--------------- AMX_aspect.cpp - SYCL device test
6-
//------------------------==//
5+
//==--------------- matrix_aspect.cpp - SYCL device test--------------------==//
76
//
87
// Checks that the has(aspect) method on a device returns the correct answer
9-
// when queried about ext_intel_matrix AMX aspect.
8+
// when queried about ext_intel_matrix joint matrix aspect.
109
//
1110
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
1211
// See https://llvm.org/LICENSE.txt for license information.
@@ -21,8 +20,10 @@ using namespace sycl;
2120
using arch = sycl::ext::oneapi::experimental::architecture;
2221
int main() {
2322
const std::vector<arch> supported_archs = {
24-
arch::intel_cpu_spr, arch::intel_gpu_pvc, arch::intel_gpu_dg2_g10,
25-
arch::intel_gpu_dg2_g11, arch::intel_gpu_dg2_g12};
23+
arch::intel_cpu_spr, arch::intel_cpu_gnr, arch::intel_cpu_dmr,
24+
arch::intel_gpu_pvc, arch::intel_gpu_dg2_g10, arch::intel_gpu_dg2_g11,
25+
arch::intel_gpu_dg2_g12, arch::intel_gpu_bmg_g21, arch::intel_gpu_lnl_m,
26+
arch::intel_gpu_arl_h, arch::intel_gpu_ptl_h, arch::intel_gpu_ptl_u};
2627
for (const auto &plt : platform::get_platforms()) {
2728
for (auto &dev : plt.get_devices()) {
2829
try {

0 commit comments

Comments
 (0)