Skip to content

Commit

Permalink
Performance results for version 1.5
Browse files Browse the repository at this point in the history
  • Loading branch information
Const-me committed Jan 24, 2023
1 parent 3cdb0f3 commit 993cfa5
Show file tree
Hide file tree
Showing 16 changed files with 648 additions and 632 deletions.
77 changes: 39 additions & 38 deletions SampleClips/columbia-large-1080ti.txt
Original file line number Diff line number Diff line change
@@ -1,43 +1,44 @@
 CPU Tasks
LoadModel 6.69478 seconds
RunComplete 33.7046 seconds
Run 33.637 seconds
Callbacks 12.7347 milliseconds, 44 calls, 289.425 microseconds average
Spectrogram 679.962 milliseconds, 41 calls, 16.5844 milliseconds average
Sample 64.9643 milliseconds, 527 calls, 123.272 microseconds average
Encode 13.5814 seconds, 9 calls, 1.50905 seconds average
Decode 20.0426 seconds, 9 calls, 2.22696 seconds average
DecodeStep 19.9774 seconds, 527 calls, 37.9077 milliseconds average
LoadModel 950.578 milliseconds
RunComplete 27.5329 seconds
Run 27.434 seconds
Callbacks 10.6484 milliseconds, 44 calls, 242.009 microseconds average
Spectrogram 199.106 milliseconds, 41 calls, 4.85624 milliseconds average
Sample 58.7404 milliseconds, 527 calls, 111.462 microseconds average
Encode 11.3813 seconds, 9 calls, 1.26459 seconds average
Decode 16.0418 seconds, 9 calls, 1.78242 seconds average
DecodeStep 15.9829 seconds, 527 calls, 30.3281 milliseconds average
GPU Tasks
LoadModel 6.50695 seconds
Run 33.4847 seconds
Encode 13.6283 seconds, 9 calls, 1.51426 seconds average
EncodeLayer 11.6754 seconds, 288 calls, 40.5397 milliseconds average
Decode 19.8563 seconds, 9 calls, 2.20626 seconds average
DecodeStep 19.8559 seconds, 527 calls, 37.6773 milliseconds average
DecodeLayer 18.5337 seconds, 16864 calls, 1.09901 milliseconds average
LoadModel 805.211 milliseconds
Run 27.338 seconds
Encode 11.3967 seconds, 9 calls, 1.2663 seconds average
EncodeLayer 9.78685 seconds, 288 calls, 33.9821 milliseconds average
Decode 15.9412 seconds, 9 calls, 1.77125 seconds average
DecodeStep 15.9412 seconds, 527 calls, 30.249 milliseconds average
DecodeLayer 15.0511 seconds, 16864 calls, 892.499 microseconds average
Compute Shaders
mulMatTiled 14.6726 seconds, 6345 calls, 2.31247 milliseconds average
mulMatByRowTiled 11.8939 seconds, 199430 calls, 59.6393 microseconds average
norm 1.3396 seconds, 51704 calls, 25.909 microseconds average
softMax 858.923 milliseconds, 17391 calls, 49.3889 microseconds average
addRepeat 792.962 milliseconds, 68896 calls, 11.5096 microseconds average
fmaRepeat1 567.753 milliseconds, 51704 calls, 10.9808 microseconds average
copyConvert 541.081 milliseconds, 34880 calls, 15.5126 microseconds average
softMaxFixed 523.378 milliseconds, 17152 calls, 30.5141 microseconds average
copyTranspose 422.677 milliseconds, 34304 calls, 12.3215 microseconds average
addRepeatScale 329.963 milliseconds, 33728 calls, 9.78305 microseconds average
addInPlace 306.328 milliseconds, 34304 calls, 8.92981 microseconds average
addRepeatGelu 290.074 milliseconds, 17170 calls, 16.8942 microseconds average
scaleInPlace 237.756 milliseconds, 17152 calls, 13.8617 microseconds average
add 196.816 milliseconds, 16873 calls, 11.6645 microseconds average
convolutionMain2Fixed 187.457 milliseconds, 9 calls, 20.8285 milliseconds average
diagMaskInf 103.247 milliseconds, 16864 calls, 6.12231 microseconds average
convolutionMain 75.5589 milliseconds, 9 calls, 8.39543 milliseconds average
convolutionPrep1 21.4927 milliseconds, 18 calls, 1.19404 milliseconds average
addRows 9.2908 milliseconds, 527 calls, 17.6296 microseconds average
convolutionPrep2 5.0944 milliseconds, 18 calls, 283.022 microseconds average
mulMatTiled 12.0503 seconds, 6345 calls, 1.89919 milliseconds average
mulMatByRowTiled 9.45404 seconds, 199430 calls, 47.4053 microseconds average
norm 1.32432 seconds, 51704 calls, 25.6135 microseconds average
fmaRepeat1 583.884 milliseconds, 51704 calls, 11.2928 microseconds average
addRepeatEx 536.551 milliseconds, 51168 calls, 10.4861 microseconds average
softMaxFixed 534.105 milliseconds, 17152 calls, 31.1395 microseconds average
copyConvert 500.4 milliseconds, 34880 calls, 14.3463 microseconds average
copyTranspose 377.38 milliseconds, 34304 calls, 11.001 microseconds average
addRepeatScale 315.294 milliseconds, 33728 calls, 9.34814 microseconds average
addRepeatGelu 283.978 milliseconds, 17170 calls, 16.5392 microseconds average
softMaxLong 245.57 milliseconds, 527 calls, 465.976 microseconds average
scaleInPlace 226.545 milliseconds, 17152 calls, 13.2081 microseconds average
softMax 212.206 milliseconds, 16864 calls, 12.5834 microseconds average
addRepeat 209.397 milliseconds, 17728 calls, 11.8117 microseconds average
convolutionMain2Fixed 184.615 milliseconds, 9 calls, 20.5128 milliseconds average
diagMaskInf 107.423 milliseconds, 16864 calls, 6.36998 microseconds average
convolutionMain 74.7954 milliseconds, 9 calls, 8.3106 milliseconds average
convolutionPrep1 20.9316 milliseconds, 18 calls, 1.16287 milliseconds average
convolutionPrep2 3.8103 milliseconds, 18 calls, 211.683 microseconds average
addRows 3.7939 milliseconds, 527 calls, 7.19905 microseconds average
add 1.0895 milliseconds, 9 calls, 121.056 microseconds average
Memory Usage
Model 892.591 KB RAM, 2.8815 GB VRAM
Context 92.2616 MB RAM, 1.20719 GB VRAM
Total 93.1333 MB RAM, 4.08869 GB VRAM
Context 92.2612 MB RAM, 1.14026 GB VRAM
Total 93.1329 MB RAM, 4.02176 GB VRAM
77 changes: 39 additions & 38 deletions SampleClips/columbia-large-1650.txt
Original file line number Diff line number Diff line change
@@ -1,43 +1,44 @@
CPU Tasks
LoadModel 1.39046 seconds
RunComplete 98.7705 seconds
Run 98.6893 seconds
Callbacks 10.9446 milliseconds, 44 calls, 248.741 microseconds average
Spectrogram 1.10864 seconds, 41 calls, 27.04 milliseconds average
Sample 62.5537 milliseconds, 527 calls, 118.698 microseconds average
Encode 60.6321 seconds, 9 calls, 6.7369 seconds average
Decode 38.0118 seconds, 9 calls, 4.22353 seconds average
DecodeStep 37.949 seconds, 527 calls, 72.0095 milliseconds average
LoadModel 7.95251 seconds
RunComplete 109.423 seconds
Run 109.351 seconds
Callbacks 12.7226 milliseconds, 44 calls, 289.15 microseconds average
Spectrogram 270.286 milliseconds, 41 calls, 6.59235 milliseconds average
Sample 69.0965 milliseconds, 527 calls, 131.113 microseconds average
Encode 35.943 seconds, 9 calls, 3.99366 seconds average
Decode 73.3946 seconds, 9 calls, 8.15496 seconds average
DecodeStep 73.3251 seconds, 527 calls, 139.137 milliseconds average
GPU Tasks
LoadModel 1.19991 seconds
Run 98.4248 seconds
Encode 61.0298 seconds, 9 calls, 6.78109 seconds average
EncodeLayer 51.7844 seconds, 288 calls, 179.807 milliseconds average
Decode 37.395 seconds, 9 calls, 4.155 seconds average
DecodeStep 37.3947 seconds, 527 calls, 70.9577 milliseconds average
DecodeLayer 34.8821 seconds, 16864 calls, 2.06843 milliseconds average
LoadModel 7.55659 seconds
Run 109.16 seconds
Encode 36.3141 seconds, 9 calls, 4.0349 seconds average
EncodeLayer 29.8405 seconds, 288 calls, 103.613 milliseconds average
Decode 72.8459 seconds, 9 calls, 8.09398 seconds average
DecodeStep 72.8458 seconds, 527 calls, 138.227 milliseconds average
DecodeLayer 69.0153 seconds, 16864 calls, 4.09247 milliseconds average
Compute Shaders
mulMatTiled 65.2919 seconds, 6345 calls, 10.2903 milliseconds average
mulMatByRowTiled 22.3701 seconds, 199430 calls, 112.17 microseconds average
convolutionMain2Fixed 1.37801 seconds, 9 calls, 153.113 milliseconds average
softMaxFixed 1.32519 seconds, 17152 calls, 77.2618 microseconds average
addRepeat 1.0237 seconds, 68896 calls, 14.8586 microseconds average
copyTranspose 974.149 milliseconds, 34304 calls, 28.3975 microseconds average
norm 971.572 milliseconds, 51704 calls, 18.791 microseconds average
softMax 956.611 milliseconds, 17391 calls, 55.0061 microseconds average
copyConvert 899.362 milliseconds, 34880 calls, 25.7845 microseconds average
fmaRepeat1 675.729 milliseconds, 51704 calls, 13.0692 microseconds average
addRepeatGelu 531.623 milliseconds, 17170 calls, 30.9623 microseconds average
addInPlace 461.61 milliseconds, 34304 calls, 13.4564 microseconds average
scaleInPlace 394.457 milliseconds, 17152 calls, 22.9978 microseconds average
convolutionMain 331.124 milliseconds, 9 calls, 36.7915 milliseconds average
addRepeatScale 329.854 milliseconds, 33728 calls, 9.77983 microseconds average
add 203.376 milliseconds, 16873 calls, 12.0534 microseconds average
diagMaskInf 107.127 milliseconds, 16864 calls, 6.3524 microseconds average
convolutionPrep1 58.8876 milliseconds, 18 calls, 3.27153 milliseconds average
convolutionPrep2 9.1367 milliseconds, 18 calls, 507.594 microseconds average
addRows 3.6551 milliseconds, 527 calls, 6.93567 microseconds average
mulMatTiled 36.8159 seconds, 6345 calls, 5.80234 milliseconds average
mulMatByRowTiled 28.0431 seconds, 199430 calls, 140.616 microseconds average
copyTranspose 8.11917 seconds, 34304 calls, 236.683 microseconds average
fmaRepeat1 7.85961 seconds, 51704 calls, 152.012 microseconds average
addRepeatScale 4.11915 seconds, 33728 calls, 122.129 microseconds average
softMaxFixed 3.22072 seconds, 17152 calls, 187.775 microseconds average
copyConvert 2.8333 seconds, 34880 calls, 81.2298 microseconds average
addRepeatEx 2.78075 seconds, 51168 calls, 54.3455 microseconds average
norm 2.76591 seconds, 51704 calls, 53.495 microseconds average
addRepeatGelu 2.35162 seconds, 17170 calls, 136.961 microseconds average
softMaxLong 2.24788 seconds, 527 calls, 4.26543 milliseconds average
softMax 2.21477 seconds, 16864 calls, 131.331 microseconds average
convolutionMain2Fixed 1.38064 seconds, 9 calls, 153.405 milliseconds average
addRepeat 1.30665 seconds, 17728 calls, 73.7057 microseconds average
scaleInPlace 1.10329 seconds, 17152 calls, 64.3245 microseconds average
diagMaskInf 937.457 milliseconds, 16864 calls, 55.5892 microseconds average
convolutionMain 374.967 milliseconds, 9 calls, 41.663 milliseconds average
convolutionPrep1 119.171 milliseconds, 18 calls, 6.62059 milliseconds average
convolutionPrep2 27.8894 milliseconds, 18 calls, 1.54941 milliseconds average
addRows 5.2536 milliseconds, 527 calls, 9.96888 microseconds average
add 2.8285 milliseconds, 9 calls, 314.278 microseconds average
Memory Usage
Model 892.591 KB RAM, 2.8815 GB VRAM
Context 92.2616 MB RAM, 1.20719 GB VRAM
Total 93.1333 MB RAM, 4.08869 GB VRAM
Context 92.2612 MB RAM, 1.14026 GB VRAM
Total 93.1329 MB RAM, 4.02176 GB VRAM
83 changes: 42 additions & 41 deletions SampleClips/columbia-large-vega7.txt
Original file line number Diff line number Diff line change
@@ -1,46 +1,47 @@
CPU Tasks
LoadModel 3.44286 seconds
RunComplete 174.677 seconds
Run 174.601 seconds
Callbacks 22.604 milliseconds, 44 calls, 513.727 microseconds average
Spectrogram 1.65973 seconds, 41 calls, 40.4812 milliseconds average
Sample 148.233 milliseconds, 527 calls, 281.276 microseconds average
Encode 110.192 seconds, 9 calls, 12.2436 seconds average
Decode 64.3834 seconds, 9 calls, 7.15371 seconds average
DecodeStep 64.2344 seconds, 527 calls, 121.887 milliseconds average
LoadModel 2.88964 seconds
RunComplete 140.747 seconds
Run 140.661 seconds
Callbacks 20.302 milliseconds, 44 calls, 461.409 microseconds average
Spectrogram 468.419 milliseconds, 41 calls, 11.4249 milliseconds average
Sample 139.558 milliseconds, 527 calls, 264.815 microseconds average
Encode 87.5396 seconds, 9 calls, 9.72662 seconds average
Decode 53.0971 seconds, 9 calls, 5.89968 seconds average
DecodeStep 52.9566 seconds, 527 calls, 100.487 milliseconds average
GPU Tasks
LoadModel 2.20374 seconds
Run 173.895 seconds
Encode 111.531 seconds, 9 calls, 12.3923 seconds average
EncodeLayer 96.2295 seconds, 288 calls, 334.13 milliseconds average
Decode 62.3642 seconds, 9 calls, 6.92936 seconds average
DecodeStep 62.3636 seconds, 527 calls, 118.337 milliseconds average
DecodeLayer 58.6225 seconds, 16864 calls, 3.47619 milliseconds average
LoadModel 1.86694 seconds
Run 140.175 seconds
Encode 88.7441 seconds, 9 calls, 9.86046 seconds average
EncodeLayer 75.809 seconds, 288 calls, 263.226 milliseconds average
Decode 51.4306 seconds, 9 calls, 5.71451 seconds average
DecodeStep 51.43 seconds, 527 calls, 97.5901 milliseconds average
DecodeLayer 48.1822 seconds, 16864 calls, 2.85711 milliseconds average
Compute Shaders
mulMatTiledEx 89.3411 seconds, 2880 calls, 31.0212 milliseconds average
mulMatTiled 25.4265 seconds, 3465 calls, 7.33809 milliseconds average
mulMatByRowTiled 22.2805 seconds, 166278 calls, 133.995 microseconds average
mulMatByRowTiledEx 13.8414 seconds, 33152 calls, 417.514 microseconds average
softMaxFixed 3.90482 seconds, 17152 calls, 227.66 microseconds average
addRepeatGelu 2.52778 seconds, 17170 calls, 147.221 microseconds average
norm 2.10933 seconds, 51704 calls, 40.7962 microseconds average
convolutionMain2Fixed 2.06899 seconds, 9 calls, 229.888 milliseconds average
matReshapePanels 1.99444 seconds, 1737 calls, 1.14821 milliseconds average
addRepeat 1.84752 seconds, 68896 calls, 26.816 microseconds average
fmaRepeat1 1.28479 seconds, 51704 calls, 24.849 microseconds average
copyConvert 1.23617 seconds, 34880 calls, 35.4406 microseconds average
softMax 1.11773 seconds, 17391 calls, 64.2704 microseconds average
scaleInPlace 848.371 milliseconds, 17152 calls, 49.4619 microseconds average
copyTranspose 796.781 milliseconds, 34304 calls, 23.227 microseconds average
addInPlace 733.523 milliseconds, 34304 calls, 21.383 microseconds average
addRepeatScale 727.214 milliseconds, 33728 calls, 21.5611 microseconds average
convolutionMain 535.149 milliseconds, 9 calls, 59.461 milliseconds average
add 525.766 milliseconds, 16873 calls, 31.1602 microseconds average
diagMaskInf 361.151 milliseconds, 16864 calls, 21.4155 microseconds average
convolutionPrep1 58.0177 milliseconds, 18 calls, 3.22321 milliseconds average
convolutionPrep2 30.1294 milliseconds, 18 calls, 1.67386 milliseconds average
addRows 1.8544 milliseconds, 527 calls, 3.51879 microseconds average
mulMatTiledEx 69.1011 seconds, 2880 calls, 23.9934 milliseconds average
mulMatTiled 21.009 seconds, 3465 calls, 6.06321 milliseconds average
mulMatByRowTiled 20.0965 seconds, 166278 calls, 120.861 microseconds average
mulMatByRowTiledEx 9.61326 seconds, 33152 calls, 289.975 microseconds average
softMaxFixed 3.7631 seconds, 17152 calls, 219.397 microseconds average
norm 2.23806 seconds, 51704 calls, 43.2859 microseconds average
convolutionMain2Fixed 2.12825 seconds, 9 calls, 236.472 milliseconds average
matReshapePanels 2.0333 seconds, 1737 calls, 1.17058 milliseconds average
addRepeatGelu 1.5491 seconds, 17170 calls, 90.2211 microseconds average
scaleInPlace 1.32928 seconds, 17152 calls, 77.5001 microseconds average
copyConvert 1.23135 seconds, 34880 calls, 35.3026 microseconds average
fmaRepeat1 1.10337 seconds, 51704 calls, 21.3401 microseconds average
addRepeatEx 1.00095 seconds, 51168 calls, 19.562 microseconds average
copyTranspose 846.807 milliseconds, 34304 calls, 24.6854 microseconds average
addRepeat 704.028 milliseconds, 17728 calls, 39.7128 microseconds average
softMaxLong 608.58 milliseconds, 527 calls, 1.1548 milliseconds average
convolutionMain 522.249 milliseconds, 9 calls, 58.0277 milliseconds average
addRepeatScale 500.937 milliseconds, 33728 calls, 14.8523 microseconds average
softMax 236.054 milliseconds, 16864 calls, 13.9975 microseconds average
diagMaskInf 171.964 milliseconds, 16864 calls, 10.1971 microseconds average
convolutionPrep1 60.7331 milliseconds, 18 calls, 3.37406 milliseconds average
convolutionPrep2 33.441 milliseconds, 18 calls, 1.85783 milliseconds average
add 12.0883 milliseconds, 9 calls, 1.34314 milliseconds average
addRows 1.9724 milliseconds, 527 calls, 3.74269 microseconds average
Memory Usage
Model 892.591 KB RAM, 2.8815 GB VRAM
Context 92.2617 MB RAM, 1.27432 GB VRAM
Total 93.1334 MB RAM, 4.15582 GB VRAM
Context 92.2612 MB RAM, 1.19934 GB VRAM
Total 93.1329 MB RAM, 4.08084 GB VRAM
Loading

0 comments on commit 993cfa5

Please sign in to comment.