Skip to content

Commit

Permalink
[GH-PAGES] Updated website
Browse files Browse the repository at this point in the history
  • Loading branch information
gh-actions-bot authored and gh-actions-bot committed Jun 12, 2024
1 parent acda248 commit 22f4a18
Show file tree
Hide file tree
Showing 62 changed files with 445 additions and 442 deletions.
Binary file modified main/.doctrees/environment.pickle
Binary file not shown.
Binary file modified main/.doctrees/getting-started/tutorials/01-vector-add.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified main/.doctrees/getting-started/tutorials/05-layer-norm.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified main/.doctrees/getting-started/tutorials/08-grouped-gemm.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified main/.doctrees/python-api/generated/triton.language.div_rn.doctree
Binary file not shown.
Binary file not shown.
Binary file modified main/.doctrees/python-api/triton.language.doctree
Binary file not shown.
Binary file modified main/.doctrees/sg_execution_times.doctree
Binary file not shown.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -95,17 +95,17 @@ def keep(conf):
return True


@triton.autotune(list(filter(keep, configs)), key=["N_CTX"])
@triton.autotune(list(filter(keep, configs)), key=["N_CTX", "HEAD_DIM"])
@triton.jit
def _attn_fwd(Q, K, V, sm_scale, M, Out, #
stride_qz, stride_qh, stride_qm, stride_qk, #
stride_kz, stride_kh, stride_kn, stride_kk, #
stride_vz, stride_vh, stride_vk, stride_vn, #
stride_oz, stride_oh, stride_om, stride_on, #
Z, H, N_CTX, #
HEAD_DIM: tl.constexpr, #
BLOCK_M: tl.constexpr, #
BLOCK_N: tl.constexpr, #
HEAD_DIM: tl.constexpr, #
STAGE: tl.constexpr #
):
tl.static_assert(BLOCK_N <= HEAD_DIM)
Expand Down Expand Up @@ -442,7 +442,7 @@ def forward(ctx, q, k, v, causal, sm_scale):
# shape constraints
HEAD_DIM_Q, HEAD_DIM_K = q.shape[-1], k.shape[-1]
# when v is in float8_e5m2 it is transposed.
HEAD_DIM_V = v.shape[-2] if v.dtype == torch.float8_e5m2 else v.shape[-1]
HEAD_DIM_V = v.shape[-1]
assert HEAD_DIM_Q == HEAD_DIM_K and HEAD_DIM_K == HEAD_DIM_V
assert HEAD_DIM_K in {16, 32, 64, 128, 256}
o = torch.empty_like(q)
Expand Down Expand Up @@ -609,6 +609,7 @@ def bench_flash_attention(BATCH, H, N_CTX, HEAD_DIM, causal, mode, provider, dev
if mode == "fwd" and "fp8" in provider:
q = q.to(torch.float8_e5m2)
k = k.to(torch.float8_e5m2)
v = v.permute(0, 1, 3, 2).contiguous()
v = v.permute(0, 1, 3, 2)
v = v.to(torch.float8_e5m2)
sm_scale = 1.3
Expand Down
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Persistent FP8 Matmul
=====================
This script demonstrates persistent kernel implementations of matrix multiplication using Triton.
It includes various matmul methods, such as naive, persistent, and TMA (Tile Matrix Accumulation) based approaches, and only supports GPUs with compute capability >= 9.0.
It includes various matmul methods, such as naive, persistent, and TMA (Tensor Memory Accelerator) based approaches, and only supports GPUs with compute capability >= 9.0.
Triton and CuBLAS implementations are benchmarked under different configurations and evaluated using the proton profiler.
Users can pass command-line arguments to specify matrix dimensions and iteration steps flexibly.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Persistent FP8 Matmul\nThis script demonstrates persistent kernel implementations of matrix multiplication using Triton.\nIt includes various matmul methods, such as naive, persistent, and TMA (Tile Matrix Accumulation) based approaches, and only supports GPUs with compute capability >= 9.0.\nTriton and CuBLAS implementations are benchmarked under different configurations and evaluated using the proton profiler.\nUsers can pass command-line arguments to specify matrix dimensions and iteration steps flexibly.\n"
"\n# Persistent FP8 Matmul\nThis script demonstrates persistent kernel implementations of matrix multiplication using Triton.\nIt includes various matmul methods, such as naive, persistent, and TMA (Tensor Memory Accelerator) based approaches, and only supports GPUs with compute capability >= 9.0.\nTriton and CuBLAS implementations are benchmarked under different configurations and evaluated using the proton profiler.\nUsers can pass command-line arguments to specify matrix dimensions and iteration steps flexibly.\n"
]
},
{
Expand Down
Binary file modified main/_images/sphx_glr_01-vector-add_001.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_01-vector-add_thumb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_02-fused-softmax_001.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_02-fused-softmax_thumb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_03-matrix-multiplication_001.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_03-matrix-multiplication_002.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_03-matrix-multiplication_thumb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_05-layer-norm_001.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_05-layer-norm_thumb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_06-fused-attention_001.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_06-fused-attention_002.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_06-fused-attention_003.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_06-fused-attention_thumb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_08-grouped-gemm_001.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified main/_images/sphx_glr_08-grouped-gemm_thumb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 7 additions & 7 deletions main/_sources/getting-started/tutorials/01-vector-add.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -230,17 +230,17 @@ We can now run the decorated function above. Pass `print_data=True` to see the p

vector-add-performance:
size Triton Torch
0 4096.0 8.000000 8.000000
1 8192.0 15.999999 15.999999
2 16384.0 31.999999 31.999999
0 4096.0 8.000000 9.600000
1 8192.0 19.200000 15.999999
2 16384.0 38.400001 31.999999
3 32768.0 63.999998 63.999998
4 65536.0 127.999995 127.999995
5 131072.0 219.428568 219.428568
6 262144.0 384.000001 341.333321
7 524288.0 614.400016 558.545450
6 262144.0 384.000001 384.000001
7 524288.0 614.400016 614.400016
8 1048576.0 819.200021 819.200021
9 2097152.0 1068.521715 1023.999964
10 4194304.0 1260.307736 1260.307736
10 4194304.0 1260.307736 1228.800031
11 8388608.0 1424.695621 1424.695621
12 16777216.0 1560.380965 1560.380965
13 33554432.0 1624.859540 1624.859540
Expand All @@ -253,7 +253,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p

.. rst-class:: sphx-glr-timing

**Total running time of the script:** (0 minutes 8.965 seconds)
**Total running time of the script:** (0 minutes 8.340 seconds)


.. _sphx_glr_download_getting-started_tutorials_01-vector-add.py:
Expand Down
198 changes: 99 additions & 99 deletions main/_sources/getting-started/tutorials/02-fused-softmax.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -303,104 +303,104 @@ We will then compare its performance against (1) :code:`torch.softmax` and (2) t
softmax-performance:
N Triton Torch
0 256.0 474.886872 694.483260
1 384.0 608.275337 793.865707
2 512.0 750.325069 907.709196
3 640.0 786.230049 967.184122
4 768.0 882.550621 1019.516965
5 896.0 933.082975 1070.506037
6 1024.0 990.964158 1116.796272
7 1152.0 1110.548913 613.568651
8 1280.0 1139.226810 669.627030
9 1408.0 1161.877551 724.977307
10 1536.0 1192.678787 780.087866
11 1664.0 1208.494960 813.990785
12 1792.0 1228.799871 855.423724
13 1920.0 1253.318071 910.135827
14 2048.0 1272.176961 958.212585
15 2176.0 1262.729082 977.668960
16 2304.0 1263.769744 1009.549848
17 2432.0 1300.221255 1058.137453
18 2560.0 1304.930348 1083.903210
19 2688.0 1310.316031 1101.638060
20 2816.0 1333.607448 1130.354944
21 2944.0 1326.281560 1170.386713
22 3072.0 1354.066553 1182.816857
23 3200.0 1354.196012 1191.882725
24 3328.0 1362.253764 1226.216284
25 3456.0 1372.578742 1249.475639
26 3584.0 1375.513688 1257.543915
27 3712.0 1383.660024 1266.022294
28 3840.0 1384.542849 1301.957450
29 3968.0 1387.246937 1313.791403
30 4096.0 1401.801736 1323.349043
31 4224.0 1336.502774 1162.260414
32 4352.0 1339.787153 1172.044541
33 4480.0 1351.870184 1183.736530
34 4608.0 1357.423399 1196.479290
35 4736.0 1361.989884 1198.912098
36 4864.0 1380.536616 1222.557769
37 4992.0 1369.355432 1238.439229
38 5120.0 1373.097897 1251.131265
39 5248.0 1377.856986 1258.540361
40 5376.0 1380.042163 1286.206289
41 5504.0 1383.007577 1297.681281
42 5632.0 1387.751447 1310.564525
43 5760.0 1393.654050 1328.675932
44 5888.0 1393.712112 1339.936464
45 6016.0 1403.099460 1354.114534
46 6144.0 1409.151442 1373.076994
47 6272.0 1412.647808 1378.095969
48 6400.0 1416.736394 1388.316354
49 6528.0 1415.977502 1395.751557
50 6656.0 1418.974179 1404.295022
51 6784.0 1417.375807 1416.872608
52 6912.0 1426.498550 1424.602149
53 7040.0 1418.695818 1430.935232
54 7168.0 1430.265776 1433.229186
55 7296.0 1434.456799 1443.744323
56 7424.0 1430.059198 1446.000678
57 7552.0 1427.527213 1454.072801
58 7680.0 1438.971950 1461.880615
59 7808.0 1431.538789 1463.922739
60 7936.0 1433.513264 1469.388283
61 8064.0 1438.438301 1475.123808
62 8192.0 1436.675086 1485.034777
63 8320.0 1384.416885 1403.418539
64 8448.0 1381.969366 1402.997531
65 8576.0 1396.067673 1392.623468
66 8704.0 1387.553880 1402.493073
67 8832.0 1384.059893 1403.281876
68 8960.0 1395.940239 1411.576239
69 9088.0 1408.235295 1415.618970
70 9216.0 1405.069496 1427.334371
71 9344.0 1397.409535 1422.342505
72 9472.0 1398.859515 1433.090827
73 9600.0 1397.536414 1430.981887
74 9728.0 1399.649646 1444.275164
75 9856.0 1418.279890 1440.370402
76 9984.0 1403.428461 1452.395129
77 10112.0 1410.351708 1452.974939
78 10240.0 1414.819754 1466.205557
79 10368.0 1411.204824 1462.887115
80 10496.0 1414.606195 1466.766495
81 10624.0 1409.348141 1467.196829
82 10752.0 1402.620501 1473.049479
83 10880.0 1400.636832 1481.012938
84 11008.0 1417.088863 1478.179239
85 11136.0 1423.359373 1486.956090
86 11264.0 1430.935728 1486.185796
87 11392.0 1414.594401 1490.294891
88 11520.0 1425.302497 1494.486333
89 11648.0 1431.407108 1497.469564
90 11776.0 1428.276153 1501.569828
91 11904.0 1439.345217 1507.349950
92 12032.0 1420.063148 1506.856766
93 12160.0 1418.473769 1509.935419
94 12288.0 1438.239656 1392.164615
95 12416.0 1445.141343 1392.158306
96 12544.0 1442.222511 1390.996328
97 12672.0 1446.057806 1389.808823
0 256.0 483.373814 691.410530
1 384.0 618.284228 807.811239
2 512.0 758.376010 913.515445
3 640.0 796.139488 952.027425
4 768.0 884.106445 1017.410462
5 896.0 931.881621 1060.687548
6 1024.0 987.890277 1124.106325
7 1152.0 1109.174905 614.814920
8 1280.0 1155.811618 669.919401
9 1408.0 1159.883220 725.626002
10 1536.0 1195.718508 779.164094
11 1664.0 1211.134824 814.654418
12 1792.0 1234.185297 856.258949
13 1920.0 1254.848750 909.547141
14 2048.0 1280.527072 960.181979
15 2176.0 1266.130989 977.655676
16 2304.0 1269.216999 1008.152209
17 2432.0 1301.518687 1054.155940
18 2560.0 1305.882814 1083.215684
19 2688.0 1309.052134 1103.331214
20 2816.0 1326.479814 1133.798566
21 2944.0 1327.003204 1164.515965
22 3072.0 1355.254207 1182.967957
23 3200.0 1354.312383 1196.631655
24 3328.0 1363.235926 1222.992633
25 3456.0 1374.813321 1247.572000
26 3584.0 1379.471259 1261.300215
27 3712.0 1383.468547 1269.207431
28 3840.0 1390.274965 1303.357204
29 3968.0 1392.744551 1313.293799
30 4096.0 1395.432489 1324.162252
31 4224.0 1330.762484 1160.325788
32 4352.0 1337.949192 1173.518168
33 4480.0 1354.147132 1180.916725
34 4608.0 1364.426067 1194.316176
35 4736.0 1358.579305 1196.606959
36 4864.0 1377.283327 1220.686468
37 4992.0 1373.385271 1235.483630
38 5120.0 1372.565674 1248.511059
39 5248.0 1373.629334 1257.727135
40 5376.0 1374.573949 1286.122332
41 5504.0 1383.845804 1297.084539
42 5632.0 1383.685094 1312.995012
43 5760.0 1395.206666 1324.070267
44 5888.0 1388.315574 1342.564480
45 6016.0 1398.865022 1352.200223
46 6144.0 1412.620414 1373.401229
47 6272.0 1412.668209 1374.286825
48 6400.0 1419.615232 1385.639329
49 6528.0 1412.254639 1396.029543
50 6656.0 1420.756083 1403.343240
51 6784.0 1414.896258 1413.407734
52 6912.0 1428.907998 1423.411192
53 7040.0 1420.585163 1432.429243
54 7168.0 1427.360996 1432.997128
55 7296.0 1431.118521 1444.424889
56 7424.0 1428.966318 1445.439665
57 7552.0 1423.730409 1454.009783
58 7680.0 1433.413558 1459.313934
59 7808.0 1432.827777 1467.103305
60 7936.0 1434.701316 1467.759825
61 8064.0 1440.947239 1472.594700
62 8192.0 1436.460859 1483.730067
63 8320.0 1389.977920 1403.779373
64 8448.0 1378.195253 1405.602653
65 8576.0 1398.062328 1394.030166
66 8704.0 1390.322779 1401.275831
67 8832.0 1385.130096 1402.587790
68 8960.0 1399.843525 1411.445897
69 9088.0 1410.506406 1416.167489
70 9216.0 1405.039396 1425.971433
71 9344.0 1400.395161 1423.667293
72 9472.0 1396.965321 1432.095267
73 9600.0 1393.987492 1436.089884
74 9728.0 1398.630681 1442.419996
75 9856.0 1415.951782 1442.786485
76 9984.0 1400.951074 1451.195595
77 10112.0 1413.151543 1455.469015
78 10240.0 1419.068719 1466.755375
79 10368.0 1415.493789 1460.576378
80 10496.0 1415.119991 1467.882429
81 10624.0 1412.048253 1467.768703
82 10752.0 1413.444281 1472.473186
83 10880.0 1402.165427 1477.500992
84 11008.0 1422.362282 1479.161425
85 11136.0 1422.191503 1484.824616
86 11264.0 1431.432382 1485.935090
87 11392.0 1417.856347 1491.204361
88 11520.0 1422.069283 1492.275406
89 11648.0 1427.668803 1499.136712
90 11776.0 1433.812357 1501.546145
91 11904.0 1445.187994 1506.729085
92 12032.0 1426.405254 1508.512115
93 12160.0 1419.359808 1512.522930
94 12288.0 1435.564739 1391.805939
95 12416.0 1448.499886 1391.218675
96 12544.0 1441.379527 1393.247778
97 12672.0 1449.838003 1391.809115
Expand All @@ -415,7 +415,7 @@ In the above plot, we can see that:

.. rst-class:: sphx-glr-timing

**Total running time of the script:** (0 minutes 34.483 seconds)
**Total running time of the script:** (0 minutes 28.602 seconds)


.. _sphx_glr_download_getting-started_tutorials_02-fused-softmax.py:
Expand Down
Loading

0 comments on commit 22f4a18

Please sign in to comment.