-
Notifications
You must be signed in to change notification settings - Fork 754
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
16 changed files
with
648 additions
and
632 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,43 +1,44 @@ | ||
CPU Tasks | ||
LoadModel 6.69478 seconds | ||
RunComplete 33.7046 seconds | ||
Run 33.637 seconds | ||
Callbacks 12.7347 milliseconds, 44 calls, 289.425 microseconds average | ||
Spectrogram 679.962 milliseconds, 41 calls, 16.5844 milliseconds average | ||
Sample 64.9643 milliseconds, 527 calls, 123.272 microseconds average | ||
Encode 13.5814 seconds, 9 calls, 1.50905 seconds average | ||
Decode 20.0426 seconds, 9 calls, 2.22696 seconds average | ||
DecodeStep 19.9774 seconds, 527 calls, 37.9077 milliseconds average | ||
LoadModel 950.578 milliseconds | ||
RunComplete 27.5329 seconds | ||
Run 27.434 seconds | ||
Callbacks 10.6484 milliseconds, 44 calls, 242.009 microseconds average | ||
Spectrogram 199.106 milliseconds, 41 calls, 4.85624 milliseconds average | ||
Sample 58.7404 milliseconds, 527 calls, 111.462 microseconds average | ||
Encode 11.3813 seconds, 9 calls, 1.26459 seconds average | ||
Decode 16.0418 seconds, 9 calls, 1.78242 seconds average | ||
DecodeStep 15.9829 seconds, 527 calls, 30.3281 milliseconds average | ||
GPU Tasks | ||
LoadModel 6.50695 seconds | ||
Run 33.4847 seconds | ||
Encode 13.6283 seconds, 9 calls, 1.51426 seconds average | ||
EncodeLayer 11.6754 seconds, 288 calls, 40.5397 milliseconds average | ||
Decode 19.8563 seconds, 9 calls, 2.20626 seconds average | ||
DecodeStep 19.8559 seconds, 527 calls, 37.6773 milliseconds average | ||
DecodeLayer 18.5337 seconds, 16864 calls, 1.09901 milliseconds average | ||
LoadModel 805.211 milliseconds | ||
Run 27.338 seconds | ||
Encode 11.3967 seconds, 9 calls, 1.2663 seconds average | ||
EncodeLayer 9.78685 seconds, 288 calls, 33.9821 milliseconds average | ||
Decode 15.9412 seconds, 9 calls, 1.77125 seconds average | ||
DecodeStep 15.9412 seconds, 527 calls, 30.249 milliseconds average | ||
DecodeLayer 15.0511 seconds, 16864 calls, 892.499 microseconds average | ||
Compute Shaders | ||
mulMatTiled 14.6726 seconds, 6345 calls, 2.31247 milliseconds average | ||
mulMatByRowTiled 11.8939 seconds, 199430 calls, 59.6393 microseconds average | ||
norm 1.3396 seconds, 51704 calls, 25.909 microseconds average | ||
softMax 858.923 milliseconds, 17391 calls, 49.3889 microseconds average | ||
addRepeat 792.962 milliseconds, 68896 calls, 11.5096 microseconds average | ||
fmaRepeat1 567.753 milliseconds, 51704 calls, 10.9808 microseconds average | ||
copyConvert 541.081 milliseconds, 34880 calls, 15.5126 microseconds average | ||
softMaxFixed 523.378 milliseconds, 17152 calls, 30.5141 microseconds average | ||
copyTranspose 422.677 milliseconds, 34304 calls, 12.3215 microseconds average | ||
addRepeatScale 329.963 milliseconds, 33728 calls, 9.78305 microseconds average | ||
addInPlace 306.328 milliseconds, 34304 calls, 8.92981 microseconds average | ||
addRepeatGelu 290.074 milliseconds, 17170 calls, 16.8942 microseconds average | ||
scaleInPlace 237.756 milliseconds, 17152 calls, 13.8617 microseconds average | ||
add 196.816 milliseconds, 16873 calls, 11.6645 microseconds average | ||
convolutionMain2Fixed 187.457 milliseconds, 9 calls, 20.8285 milliseconds average | ||
diagMaskInf 103.247 milliseconds, 16864 calls, 6.12231 microseconds average | ||
convolutionMain 75.5589 milliseconds, 9 calls, 8.39543 milliseconds average | ||
convolutionPrep1 21.4927 milliseconds, 18 calls, 1.19404 milliseconds average | ||
addRows 9.2908 milliseconds, 527 calls, 17.6296 microseconds average | ||
convolutionPrep2 5.0944 milliseconds, 18 calls, 283.022 microseconds average | ||
mulMatTiled 12.0503 seconds, 6345 calls, 1.89919 milliseconds average | ||
mulMatByRowTiled 9.45404 seconds, 199430 calls, 47.4053 microseconds average | ||
norm 1.32432 seconds, 51704 calls, 25.6135 microseconds average | ||
fmaRepeat1 583.884 milliseconds, 51704 calls, 11.2928 microseconds average | ||
addRepeatEx 536.551 milliseconds, 51168 calls, 10.4861 microseconds average | ||
softMaxFixed 534.105 milliseconds, 17152 calls, 31.1395 microseconds average | ||
copyConvert 500.4 milliseconds, 34880 calls, 14.3463 microseconds average | ||
copyTranspose 377.38 milliseconds, 34304 calls, 11.001 microseconds average | ||
addRepeatScale 315.294 milliseconds, 33728 calls, 9.34814 microseconds average | ||
addRepeatGelu 283.978 milliseconds, 17170 calls, 16.5392 microseconds average | ||
softMaxLong 245.57 milliseconds, 527 calls, 465.976 microseconds average | ||
scaleInPlace 226.545 milliseconds, 17152 calls, 13.2081 microseconds average | ||
softMax 212.206 milliseconds, 16864 calls, 12.5834 microseconds average | ||
addRepeat 209.397 milliseconds, 17728 calls, 11.8117 microseconds average | ||
convolutionMain2Fixed 184.615 milliseconds, 9 calls, 20.5128 milliseconds average | ||
diagMaskInf 107.423 milliseconds, 16864 calls, 6.36998 microseconds average | ||
convolutionMain 74.7954 milliseconds, 9 calls, 8.3106 milliseconds average | ||
convolutionPrep1 20.9316 milliseconds, 18 calls, 1.16287 milliseconds average | ||
convolutionPrep2 3.8103 milliseconds, 18 calls, 211.683 microseconds average | ||
addRows 3.7939 milliseconds, 527 calls, 7.19905 microseconds average | ||
add 1.0895 milliseconds, 9 calls, 121.056 microseconds average | ||
Memory Usage | ||
Model 892.591 KB RAM, 2.8815 GB VRAM | ||
Context 92.2616 MB RAM, 1.20719 GB VRAM | ||
Total 93.1333 MB RAM, 4.08869 GB VRAM | ||
Context 92.2612 MB RAM, 1.14026 GB VRAM | ||
Total 93.1329 MB RAM, 4.02176 GB VRAM |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,43 +1,44 @@ | ||
CPU Tasks | ||
LoadModel 1.39046 seconds | ||
RunComplete 98.7705 seconds | ||
Run 98.6893 seconds | ||
Callbacks 10.9446 milliseconds, 44 calls, 248.741 microseconds average | ||
Spectrogram 1.10864 seconds, 41 calls, 27.04 milliseconds average | ||
Sample 62.5537 milliseconds, 527 calls, 118.698 microseconds average | ||
Encode 60.6321 seconds, 9 calls, 6.7369 seconds average | ||
Decode 38.0118 seconds, 9 calls, 4.22353 seconds average | ||
DecodeStep 37.949 seconds, 527 calls, 72.0095 milliseconds average | ||
LoadModel 7.95251 seconds | ||
RunComplete 109.423 seconds | ||
Run 109.351 seconds | ||
Callbacks 12.7226 milliseconds, 44 calls, 289.15 microseconds average | ||
Spectrogram 270.286 milliseconds, 41 calls, 6.59235 milliseconds average | ||
Sample 69.0965 milliseconds, 527 calls, 131.113 microseconds average | ||
Encode 35.943 seconds, 9 calls, 3.99366 seconds average | ||
Decode 73.3946 seconds, 9 calls, 8.15496 seconds average | ||
DecodeStep 73.3251 seconds, 527 calls, 139.137 milliseconds average | ||
GPU Tasks | ||
LoadModel 1.19991 seconds | ||
Run 98.4248 seconds | ||
Encode 61.0298 seconds, 9 calls, 6.78109 seconds average | ||
EncodeLayer 51.7844 seconds, 288 calls, 179.807 milliseconds average | ||
Decode 37.395 seconds, 9 calls, 4.155 seconds average | ||
DecodeStep 37.3947 seconds, 527 calls, 70.9577 milliseconds average | ||
DecodeLayer 34.8821 seconds, 16864 calls, 2.06843 milliseconds average | ||
LoadModel 7.55659 seconds | ||
Run 109.16 seconds | ||
Encode 36.3141 seconds, 9 calls, 4.0349 seconds average | ||
EncodeLayer 29.8405 seconds, 288 calls, 103.613 milliseconds average | ||
Decode 72.8459 seconds, 9 calls, 8.09398 seconds average | ||
DecodeStep 72.8458 seconds, 527 calls, 138.227 milliseconds average | ||
DecodeLayer 69.0153 seconds, 16864 calls, 4.09247 milliseconds average | ||
Compute Shaders | ||
mulMatTiled 65.2919 seconds, 6345 calls, 10.2903 milliseconds average | ||
mulMatByRowTiled 22.3701 seconds, 199430 calls, 112.17 microseconds average | ||
convolutionMain2Fixed 1.37801 seconds, 9 calls, 153.113 milliseconds average | ||
softMaxFixed 1.32519 seconds, 17152 calls, 77.2618 microseconds average | ||
addRepeat 1.0237 seconds, 68896 calls, 14.8586 microseconds average | ||
copyTranspose 974.149 milliseconds, 34304 calls, 28.3975 microseconds average | ||
norm 971.572 milliseconds, 51704 calls, 18.791 microseconds average | ||
softMax 956.611 milliseconds, 17391 calls, 55.0061 microseconds average | ||
copyConvert 899.362 milliseconds, 34880 calls, 25.7845 microseconds average | ||
fmaRepeat1 675.729 milliseconds, 51704 calls, 13.0692 microseconds average | ||
addRepeatGelu 531.623 milliseconds, 17170 calls, 30.9623 microseconds average | ||
addInPlace 461.61 milliseconds, 34304 calls, 13.4564 microseconds average | ||
scaleInPlace 394.457 milliseconds, 17152 calls, 22.9978 microseconds average | ||
convolutionMain 331.124 milliseconds, 9 calls, 36.7915 milliseconds average | ||
addRepeatScale 329.854 milliseconds, 33728 calls, 9.77983 microseconds average | ||
add 203.376 milliseconds, 16873 calls, 12.0534 microseconds average | ||
diagMaskInf 107.127 milliseconds, 16864 calls, 6.3524 microseconds average | ||
convolutionPrep1 58.8876 milliseconds, 18 calls, 3.27153 milliseconds average | ||
convolutionPrep2 9.1367 milliseconds, 18 calls, 507.594 microseconds average | ||
addRows 3.6551 milliseconds, 527 calls, 6.93567 microseconds average | ||
mulMatTiled 36.8159 seconds, 6345 calls, 5.80234 milliseconds average | ||
mulMatByRowTiled 28.0431 seconds, 199430 calls, 140.616 microseconds average | ||
copyTranspose 8.11917 seconds, 34304 calls, 236.683 microseconds average | ||
fmaRepeat1 7.85961 seconds, 51704 calls, 152.012 microseconds average | ||
addRepeatScale 4.11915 seconds, 33728 calls, 122.129 microseconds average | ||
softMaxFixed 3.22072 seconds, 17152 calls, 187.775 microseconds average | ||
copyConvert 2.8333 seconds, 34880 calls, 81.2298 microseconds average | ||
addRepeatEx 2.78075 seconds, 51168 calls, 54.3455 microseconds average | ||
norm 2.76591 seconds, 51704 calls, 53.495 microseconds average | ||
addRepeatGelu 2.35162 seconds, 17170 calls, 136.961 microseconds average | ||
softMaxLong 2.24788 seconds, 527 calls, 4.26543 milliseconds average | ||
softMax 2.21477 seconds, 16864 calls, 131.331 microseconds average | ||
convolutionMain2Fixed 1.38064 seconds, 9 calls, 153.405 milliseconds average | ||
addRepeat 1.30665 seconds, 17728 calls, 73.7057 microseconds average | ||
scaleInPlace 1.10329 seconds, 17152 calls, 64.3245 microseconds average | ||
diagMaskInf 937.457 milliseconds, 16864 calls, 55.5892 microseconds average | ||
convolutionMain 374.967 milliseconds, 9 calls, 41.663 milliseconds average | ||
convolutionPrep1 119.171 milliseconds, 18 calls, 6.62059 milliseconds average | ||
convolutionPrep2 27.8894 milliseconds, 18 calls, 1.54941 milliseconds average | ||
addRows 5.2536 milliseconds, 527 calls, 9.96888 microseconds average | ||
add 2.8285 milliseconds, 9 calls, 314.278 microseconds average | ||
Memory Usage | ||
Model 892.591 KB RAM, 2.8815 GB VRAM | ||
Context 92.2616 MB RAM, 1.20719 GB VRAM | ||
Total 93.1333 MB RAM, 4.08869 GB VRAM | ||
Context 92.2612 MB RAM, 1.14026 GB VRAM | ||
Total 93.1329 MB RAM, 4.02176 GB VRAM |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,46 +1,47 @@ | ||
CPU Tasks | ||
LoadModel 3.44286 seconds | ||
RunComplete 174.677 seconds | ||
Run 174.601 seconds | ||
Callbacks 22.604 milliseconds, 44 calls, 513.727 microseconds average | ||
Spectrogram 1.65973 seconds, 41 calls, 40.4812 milliseconds average | ||
Sample 148.233 milliseconds, 527 calls, 281.276 microseconds average | ||
Encode 110.192 seconds, 9 calls, 12.2436 seconds average | ||
Decode 64.3834 seconds, 9 calls, 7.15371 seconds average | ||
DecodeStep 64.2344 seconds, 527 calls, 121.887 milliseconds average | ||
LoadModel 2.88964 seconds | ||
RunComplete 140.747 seconds | ||
Run 140.661 seconds | ||
Callbacks 20.302 milliseconds, 44 calls, 461.409 microseconds average | ||
Spectrogram 468.419 milliseconds, 41 calls, 11.4249 milliseconds average | ||
Sample 139.558 milliseconds, 527 calls, 264.815 microseconds average | ||
Encode 87.5396 seconds, 9 calls, 9.72662 seconds average | ||
Decode 53.0971 seconds, 9 calls, 5.89968 seconds average | ||
DecodeStep 52.9566 seconds, 527 calls, 100.487 milliseconds average | ||
GPU Tasks | ||
LoadModel 2.20374 seconds | ||
Run 173.895 seconds | ||
Encode 111.531 seconds, 9 calls, 12.3923 seconds average | ||
EncodeLayer 96.2295 seconds, 288 calls, 334.13 milliseconds average | ||
Decode 62.3642 seconds, 9 calls, 6.92936 seconds average | ||
DecodeStep 62.3636 seconds, 527 calls, 118.337 milliseconds average | ||
DecodeLayer 58.6225 seconds, 16864 calls, 3.47619 milliseconds average | ||
LoadModel 1.86694 seconds | ||
Run 140.175 seconds | ||
Encode 88.7441 seconds, 9 calls, 9.86046 seconds average | ||
EncodeLayer 75.809 seconds, 288 calls, 263.226 milliseconds average | ||
Decode 51.4306 seconds, 9 calls, 5.71451 seconds average | ||
DecodeStep 51.43 seconds, 527 calls, 97.5901 milliseconds average | ||
DecodeLayer 48.1822 seconds, 16864 calls, 2.85711 milliseconds average | ||
Compute Shaders | ||
mulMatTiledEx 89.3411 seconds, 2880 calls, 31.0212 milliseconds average | ||
mulMatTiled 25.4265 seconds, 3465 calls, 7.33809 milliseconds average | ||
mulMatByRowTiled 22.2805 seconds, 166278 calls, 133.995 microseconds average | ||
mulMatByRowTiledEx 13.8414 seconds, 33152 calls, 417.514 microseconds average | ||
softMaxFixed 3.90482 seconds, 17152 calls, 227.66 microseconds average | ||
addRepeatGelu 2.52778 seconds, 17170 calls, 147.221 microseconds average | ||
norm 2.10933 seconds, 51704 calls, 40.7962 microseconds average | ||
convolutionMain2Fixed 2.06899 seconds, 9 calls, 229.888 milliseconds average | ||
matReshapePanels 1.99444 seconds, 1737 calls, 1.14821 milliseconds average | ||
addRepeat 1.84752 seconds, 68896 calls, 26.816 microseconds average | ||
fmaRepeat1 1.28479 seconds, 51704 calls, 24.849 microseconds average | ||
copyConvert 1.23617 seconds, 34880 calls, 35.4406 microseconds average | ||
softMax 1.11773 seconds, 17391 calls, 64.2704 microseconds average | ||
scaleInPlace 848.371 milliseconds, 17152 calls, 49.4619 microseconds average | ||
copyTranspose 796.781 milliseconds, 34304 calls, 23.227 microseconds average | ||
addInPlace 733.523 milliseconds, 34304 calls, 21.383 microseconds average | ||
addRepeatScale 727.214 milliseconds, 33728 calls, 21.5611 microseconds average | ||
convolutionMain 535.149 milliseconds, 9 calls, 59.461 milliseconds average | ||
add 525.766 milliseconds, 16873 calls, 31.1602 microseconds average | ||
diagMaskInf 361.151 milliseconds, 16864 calls, 21.4155 microseconds average | ||
convolutionPrep1 58.0177 milliseconds, 18 calls, 3.22321 milliseconds average | ||
convolutionPrep2 30.1294 milliseconds, 18 calls, 1.67386 milliseconds average | ||
addRows 1.8544 milliseconds, 527 calls, 3.51879 microseconds average | ||
mulMatTiledEx 69.1011 seconds, 2880 calls, 23.9934 milliseconds average | ||
mulMatTiled 21.009 seconds, 3465 calls, 6.06321 milliseconds average | ||
mulMatByRowTiled 20.0965 seconds, 166278 calls, 120.861 microseconds average | ||
mulMatByRowTiledEx 9.61326 seconds, 33152 calls, 289.975 microseconds average | ||
softMaxFixed 3.7631 seconds, 17152 calls, 219.397 microseconds average | ||
norm 2.23806 seconds, 51704 calls, 43.2859 microseconds average | ||
convolutionMain2Fixed 2.12825 seconds, 9 calls, 236.472 milliseconds average | ||
matReshapePanels 2.0333 seconds, 1737 calls, 1.17058 milliseconds average | ||
addRepeatGelu 1.5491 seconds, 17170 calls, 90.2211 microseconds average | ||
scaleInPlace 1.32928 seconds, 17152 calls, 77.5001 microseconds average | ||
copyConvert 1.23135 seconds, 34880 calls, 35.3026 microseconds average | ||
fmaRepeat1 1.10337 seconds, 51704 calls, 21.3401 microseconds average | ||
addRepeatEx 1.00095 seconds, 51168 calls, 19.562 microseconds average | ||
copyTranspose 846.807 milliseconds, 34304 calls, 24.6854 microseconds average | ||
addRepeat 704.028 milliseconds, 17728 calls, 39.7128 microseconds average | ||
softMaxLong 608.58 milliseconds, 527 calls, 1.1548 milliseconds average | ||
convolutionMain 522.249 milliseconds, 9 calls, 58.0277 milliseconds average | ||
addRepeatScale 500.937 milliseconds, 33728 calls, 14.8523 microseconds average | ||
softMax 236.054 milliseconds, 16864 calls, 13.9975 microseconds average | ||
diagMaskInf 171.964 milliseconds, 16864 calls, 10.1971 microseconds average | ||
convolutionPrep1 60.7331 milliseconds, 18 calls, 3.37406 milliseconds average | ||
convolutionPrep2 33.441 milliseconds, 18 calls, 1.85783 milliseconds average | ||
add 12.0883 milliseconds, 9 calls, 1.34314 milliseconds average | ||
addRows 1.9724 milliseconds, 527 calls, 3.74269 microseconds average | ||
Memory Usage | ||
Model 892.591 KB RAM, 2.8815 GB VRAM | ||
Context 92.2617 MB RAM, 1.27432 GB VRAM | ||
Total 93.1334 MB RAM, 4.15582 GB VRAM | ||
Context 92.2612 MB RAM, 1.19934 GB VRAM | ||
Total 93.1329 MB RAM, 4.08084 GB VRAM |
Oops, something went wrong.