@@ -57,8 +57,8 @@ optional kernel features as defined in section 5.7 of the core SYCL
57
57
specification. Each device supports only certain values for the `M`,
58
58
`N`, and `K` template parameters and only certain types for the `Ta`,
59
59
`Tb`, and `Tc` template parameters. Applications can use the query API
60
- in `matrix_params` or
61
- `get_info<ext::oneapi::experimental::info::device::matrix_combinations>`
60
+ in `matrix_params` or
61
+ `get_info<ext::oneapi::experimental::info::device::matrix_combinations>`
62
62
to determine the set of legal parameters for each device. If the
63
63
application submits a kernel using an unsupported `joint_matrix` type
64
64
or calls `joint_matrix_mad` with an unsupported combination, the
@@ -269,7 +269,7 @@ The two last overloads of `joint_matrix_load` take
269
269
`sycl::ext::oneapi::experimental::annotated_ptr` as argument instead
270
270
of `sycl::multi_ptr`. The property list associated with the
271
271
`annotated_ptr` argument represents the compile-time constant
272
- properties for cache control included in the SYCL extenion
272
+ properties for cache control included in the SYCL extension
273
273
link:../../proposed/sycl_ext_intel_cache_controls.asciidoc[sycl_ext_intel_cache_controls]
274
274
as illustrated in the example below.
275
275
@@ -1109,43 +1109,49 @@ This is currently available in devices with the architecture
1109
1109
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1110
1110
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_dg2_g10`,
1111
1111
`architecture::intel_gpu_dg2_g11`, `architecture::intel_gpu_dg2_g12`,
1112
- and `architecture::intel_gpu_arl_h`.
1112
+ `architecture::intel_gpu_arl_h`, `architecture::intel_gpu_ptl_h`, and
1113
+ `architecture::intel_gpu_ptl_u`.
1113
1114
1114
1115
[frame="none",options="header"]
1115
1116
|======================
1116
1117
| A type | B type | C type | D type | M | N | K | device
1117
1118
.2+| `matrix_type::uint8` .2+| `matrix_type::uint8` .2+|
1118
1119
`matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
1119
1120
|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1120
- `architecture::intel_gpu_lnl_m`
1121
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1122
+ `architecture::intel_gpu_ptl_u`
1121
1123
|8|`architecture::intel_gpu_dg2_g10,
1122
1124
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1123
1125
`architecture::intel_gpu_arl_h`
1124
1126
.2+| `matrix_type::uint8` .2+| `matrix_type::sint8` .2+|
1125
1127
`matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1126
1128
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1127
- `architecture::intel_gpu_lnl_m`
1129
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1130
+ `architecture::intel_gpu_ptl_u`
1128
1131
|8|`architecture::intel_gpu_dg2_g10,
1129
1132
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1130
1133
`architecture::intel_gpu_arl_h`
1131
1134
.2+| `matrix_type::sint8` .2+| `matrix_type::uint8` .2+|
1132
1135
`matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1133
1136
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1134
- `architecture::intel_gpu_lnl_m`
1137
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1138
+ `architecture::intel_gpu_ptl_u`
1135
1139
|8|`architecture::intel_gpu_dg2_g10,
1136
1140
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1137
1141
`architecture::intel_gpu_arl_h`
1138
1142
.2+| `matrix_type::sint8` .2+| `matrix_type::sint8` .2+|
1139
1143
`matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1140
1144
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1141
- `architecture::intel_gpu_lnl_m`
1145
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1146
+ `architecture::intel_gpu_ptl_u`
1142
1147
|8|`architecture::intel_gpu_dg2_g10,
1143
1148
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1144
1149
`architecture::intel_gpu_arl_h`
1145
1150
.8+|`matrix_type::fp16` .8+| `matrix_type::fp16` .8+|
1146
1151
`matrix_type::fp32` .8+|`matrix_type::fp32` .1+| 16 .1+| 16 | 16
1147
1152
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1148
- `architecture::intel_gpu_lnl_m`
1153
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1154
+ `architecture::intel_gpu_ptl_u`
1149
1155
.2+| 1 .2+| 64 | 16 |32
1150
1156
.2+| 32 .2+| 64 | 16 |32
1151
1157
.2+| +<=+ 8 | 16 .2+| 16
@@ -1156,24 +1162,28 @@ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1156
1162
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
1157
1163
`matrix_type::fp16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16
1158
1164
.6+| `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1159
- `architecture::intel_gpu_lnl_m`
1165
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1166
+ `architecture::intel_gpu_ptl_u`
1160
1167
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1161
1168
.2+| 32 .2+| 64 | 16 | 32
1162
1169
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
1163
1170
`matrix_type::fp32` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
1164
1171
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1165
- `architecture::intel_gpu_lnl_m`
1172
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1173
+ `architecture::intel_gpu_ptl_u`
1166
1174
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1167
1175
.2+| 32 .2+| 64 |16 | 32
1168
1176
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
1169
1177
`matrix_type::fp16` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
1170
1178
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1171
- `architecture::intel_gpu_lnl_m`
1179
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1180
+ `architecture::intel_gpu_ptl_u`
1172
1181
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 |32 .2+| 32 .2+| 64 | 16 | 32
1173
1182
.8+| `matrix_type::bf16` .8+| `matrix_type::bf16` .8+|
1174
1183
`matrix_type::fp32` .8+| `matrix_type::fp32` | 16 | 16 | 16
1175
1184
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1176
- `architecture::intel_gpu_lnl_m`
1185
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1186
+ `architecture::intel_gpu_ptl_u`
1177
1187
.2+| 1 .2+| 64 | 16 | 32
1178
1188
.2+| 32 .2+| 64 | 16 |32
1179
1189
.2+| +<=+ 8 | 16 .2+| 16
@@ -1184,28 +1194,34 @@ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1184
1194
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1185
1195
`matrix_type::bf16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16 .6+|
1186
1196
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1187
- `architecture::intel_gpu_lnl_m`
1197
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1198
+ `architecture::intel_gpu_ptl_u`
1188
1199
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1189
1200
.2+| 32 .2+| 64 |16 | 32
1190
1201
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1191
1202
`matrix_type::fp32` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
1192
1203
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1193
- `architecture::intel_gpu_lnl_m`
1204
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1205
+ `architecture::intel_gpu_ptl_u`
1194
1206
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1195
1207
.2+| 32 .2+| 64 |16 | 32
1196
1208
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1197
1209
`matrix_type::bf16` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
1198
1210
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1199
- `architecture::intel_gpu_lnl_m`
1211
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1212
+ `architecture::intel_gpu_ptl_u`
1200
1213
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1201
1214
.2+| 32 .2+| 64 |16 | 32
1202
1215
| `matrix_type::tf32` | `matrix_type::tf32` |
1203
1216
`matrix_type::fp32` .2+| `matrix_type::fp32` | +<=+ 8 | 16 | 8 |
1204
1217
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1205
- `architecture::intel_gpu_lnl_m`
1218
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_ptl_h`,
1219
+ `architecture::intel_gpu_ptl_u`
1206
1220
|======================
1207
1221
1208
- ===== Restrictions on `architecture::intel_gpu_pvc`
1222
+ ===== Restrictions on `architecture::intel_gpu_pvc`,
1223
+ `architecture::intel_gpu_bmg_g21`, `architecture::intel_gpu_lnl_m`,
1224
+ `architecture::intel_gpu_ptl_h`, and `architecture::intel_gpu_ptl_u`
1209
1225
1210
1226
- The `stride` parameter to `joint_matrix_load` and
1211
1227
`joint_matrix_store` has the following restrictions:
0 commit comments