Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NLLLoss #3168

Open
wants to merge 100 commits into
base: develop
Choose a base branch
from
Open

Implement NLLLoss #3168

wants to merge 100 commits into from

Conversation

hieule88
Copy link
Collaborator

@hieule88 hieule88 commented Jul 29, 2024

  • Added no reduction contiguous NLLLoss forward and backward operation and kernel.
  • Added no reduction non-contiguous NLLLoss forward and backward operation and kernel.
  • Added reduction NLLLoss forward and backward operation and kernel.
  • Added driver test and gtest for NLLLoss.
  • New API is guarded by MIOPEN_BETA_API macro.
Nllloss float16
op_name dtype size ignore_index contiguous reduction direction rocm_kernel_avg miopen_kernel_time improvement over rocm
NLLLoss float16 [16_21_512_512] 255 contiguous none fwd 890197 194222 4.583399409
NLLLoss float16 [16_21_512_512] 255 contiguous none bwd 1442306 359644 4.010371367
NLLLoss float16 [16_21_512_512] 255 noncontiguous none fwd 884532 500443 1.767497997
NLLLoss float16 [16_21_512_512] 255 noncontiguous none bwd 1442770 681580 2.116802136
NLLLoss float16 [64_21_254_333] 255 contiguous none fwd 1305054 225706 5.782097064
NLLLoss float16 [64_21_254_333] 255 contiguous none bwd 1819114 417190 4.360396941
NLLLoss float16 [64_21_254_333] 255 noncontiguous none fwd 1290158 377457 3.418026424
NLLLoss float16 [64_21_254_333] 255 noncontiguous none bwd 1828042 581190 3.145343175
NLLLoss float16 [64_21_213_331] 255 contiguous none fwd 1082360 184088 5.879579332
NLLLoss float16 [64_21_213_331] 255 contiguous none bwd 1517602 357759 4.241967358
NLLLoss float16 [64_21_213_331] 255 noncontiguous none fwd 1111929 277368 4.008858268
NLLLoss float16 [64_21_213_331] 255 noncontiguous none bwd 1510706 416906 3.623612997
NLLLoss float16 [64_21_240_332] 255 contiguous none fwd 1379231 209689 6.577507642
NLLLoss float16 [64_21_240_332] 255 contiguous none bwd 1801640 393102 4.583136183
NLLLoss float16 [64_21_240_332] 255 noncontiguous none fwd 1346462 350293 3.843816462
NLLLoss float16 [64_21_240_332] 255 noncontiguous none bwd 1808008 516301 3.501848728
NLLLoss float16 [64_21_212_320] 255 contiguous none fwd 1126393 192375 5.855194282
NLLLoss float16 [64_21_212_320] 255 contiguous none bwd 1501361 333056 4.507833517
NLLLoss float16 [64_21_212_320] 255 noncontiguous none fwd 1133625 264618 4.284005623
NLLLoss float16 [64_21_212_320] 255 noncontiguous none bwd 1504929 389283 3.865899615
NLLLoss float16 [64_21_218_333] 255 contiguous none fwd 1141113 190510 5.989780064
NLLLoss float16 [64_21_218_333] 255 contiguous none bwd 1594611 355903 4.480465183
NLLLoss float16 [64_21_218_333] 255 noncontiguous none fwd 1144889 297401 3.849647446
NLLLoss float16 [64_21_218_333] 255 noncontiguous none bwd 1594307 448163 3.557426651
NLLLoss float16 [64_21_270_333] 255 contiguous none fwd 1498160 235967 6.34902338
NLLLoss float16 [64_21_270_333] 255 contiguous none bwd 2033964 445394 4.566662326
NLLLoss float16 [64_21_270_333] 255 noncontiguous none fwd 1497568 427903 3.49978383
NLLLoss float16 [64_21_270_333] 255 noncontiguous none bwd 2029836 656137 3.09361612
NLLLoss float16 [64_21_237_329] 255 contiguous none fwd 1255186 204540 6.136628532
NLLLoss float16 [64_21_237_329] 255 contiguous none bwd 1724664 383268 4.499890416
NLLLoss float16 [64_21_237_329] 255 noncontiguous none fwd 1255423 341315 3.67819463
NLLLoss float16 [64_21_237_329] 255 noncontiguous none bwd 1727381 516541 3.344131444
NLLLoss float16 [64_21_225_246] 255 contiguous none fwd 833459 145753 5.718297394
NLLLoss float16 [64_21_225_246] 255 contiguous none bwd 1186100 269712 4.397653794
NLLLoss float16 [64_21_225_246] 255 noncontiguous none fwd 821154 191333 4.291753122
NLLLoss float16 [64_21_225_246] 255 noncontiguous none bwd 1185523 280467 4.226960748
NLLLoss float16 [64_21_240_292] 255 contiguous none fwd 1126461 185379 6.076529704
NLLLoss float16 [64_21_240_292] 255 contiguous none bwd 1543947 346670 4.453650446
NLLLoss float16 [64_21_240_292] 255 noncontiguous none fwd 1137307 281144 4.045282844
NLLLoss float16 [64_21_240_292] 255 noncontiguous none bwd 1543113 411521 3.749779477
NLLLoss float16 [64_21_288_303] 255 contiguous none fwd 1579586 229698 6.876794748
NLLLoss float16 [64_21_288_303] 255 contiguous none bwd 2035438 435930 4.66918542
NLLLoss float16 [64_21_288_303] 255 noncontiguous none fwd 1575681 379915 4.147456668
NLLLoss float16 [64_21_288_303] 255 noncontiguous none bwd 2045004 575659 3.552457271
NLLLoss float16 [64_21_274_275] 255 contiguous none fwd 1291295 197255 6.546323287
NLLLoss float16 [64_21_274_275] 255 contiguous none bwd 1718361 371632 4.623824106
NLLLoss float16 [64_21_274_275] 255 noncontiguous none fwd 1284766 291333 4.409956991
NLLLoss float16 [64_21_274_275] 255 noncontiguous none bwd 1716184 426564 4.02327435
NLLLoss float16 [64_21_273_322] 255 contiguous none fwd 1480952 232562 6.367987891
NLLLoss float16 [64_21_273_322] 255 contiguous none bwd 1981424 442885 4.473901803
NLLLoss float16 [64_21_273_322] 255 noncontiguous none fwd 1471384 413926 3.554703015
NLLLoss float16 [64_21_273_322] 255 noncontiguous none bwd 1996239 624818 3.194912759
NLLLoss float16 [64_21_240_320] 255 contiguous none fwd 1309257 206821 6.330387146
NLLLoss float16 [64_21_240_320] 255 contiguous none bwd 1718289 383173 4.484368679
NLLLoss float16 [64_21_240_320] 255 noncontiguous none fwd 1312616 333023 3.941517553
NLLLoss float16 [64_21_240_320] 255 noncontiguous none bwd 1715729 492006 3.487211538
NLLLoss float16 [64_21_238_269] 255 contiguous none fwd 1023852 165365 6.19146736
NLLLoss float16 [64_21_238_269] 255 contiguous none bwd 1409989 311922 4.520325594
NLLLoss float16 [64_21_238_269] 255 noncontiguous none fwd 1012924 246679 4.106243336
NLLLoss float16 [64_21_238_269] 255 noncontiguous none bwd 1402437 370783 3.782365966
NLLLoss float16 [64_21_213_326] 255 contiguous none fwd 1066155 182556 5.840153158
NLLLoss float16 [64_21_213_326] 255 contiguous none bwd 1502498 355761 4.223335329
NLLLoss float16 [64_21_213_326] 255 noncontiguous none fwd 1069627 266537 4.013052597
NLLLoss float16 [64_21_213_326] 255 noncontiguous none bwd 1506722 391210 3.851440403
NLLLoss float16 [64_21_297_333] 255 contiguous none fwd 1792364 267053 6.711641509
NLLLoss float16 [64_21_297_333] 255 contiguous none bwd 2313857 511777 4.521221157
NLLLoss float16 [64_21_297_333] 255 noncontiguous none fwd 1787788 499119 3.581887285
NLLLoss float16 [64_21_297_333] 255 noncontiguous none bwd 2314881 762438 3.036156383
NLLLoss float16 [64_21_212_303] 255 contiguous none fwd 1016907 168779 6.025080134
NLLLoss float16 [64_21_212_303] 255 contiguous none bwd 1383940 312332 4.4309901
NLLLoss float16 [64_21_212_303] 255 noncontiguous none fwd 1006076 242840 4.142958326
NLLLoss float16 [64_21_212_303] 255 noncontiguous none bwd 1383028 363976 3.799778007
NLLLoss float16 [64_21_230_335] 255 contiguous none fwd 1221639 202734 6.025822013
NLLLoss float16 [64_21_230_335] 255 contiguous none bwd 1686190 379371 4.444699252
NLLLoss float16 [64_21_230_335] 255 noncontiguous none fwd 1249142 315906 3.954157249
NLLLoss float16 [64_21_230_335] 255 noncontiguous none bwd 1684766 479459 3.513889613
NLLLoss float16 [64_21_198_257] 255 contiguous none fwd 703762 135767 5.183601317
NLLLoss float16 [64_21_198_257] 255 contiguous none bwd 1074762 248974 4.316763999
NLLLoss float16 [64_21_198_257] 255 noncontiguous none fwd 702306 165082 4.254285749
NLLLoss float16 [64_21_198_257] 255 noncontiguous none bwd 1080826 242378 4.459257853
NLLLoss float16 [64_21_283_320] 255 contiguous none fwd 1647087 239801 6.868557679
NLLLoss float16 [64_21_283_320] 255 contiguous none bwd 2113253 451584 4.679645426
NLLLoss float16 [64_21_283_320] 255 noncontiguous none fwd 1629423 434784 3.747660907
NLLLoss float16 [64_21_283_320] 255 noncontiguous none bwd 3439003 647705 5.309520538
NLLLoss float16 [64_21_175_333] 255 contiguous none fwd 812464 150825 5.386799271
NLLLoss float16 [64_21_175_333] 255 contiguous none bwd 1177497 293969 4.005514187
NLLLoss float16 [64_21_175_333] 255 noncontiguous none fwd 786705 211588 3.71809838
NLLLoss float16 [64_21_175_333] 255 noncontiguous none bwd 1180153 307035 3.843708372
NLLLoss float16 [64_21_267_326] 255 contiguous none fwd 1471443 233277 6.307707146
NLLLoss float16 [64_21_267_326] 255 contiguous none bwd 1980842 439140 4.510730063
NLLLoss float16 [64_21_267_326] 255 noncontiguous none fwd 1470628 404048 3.639735873
NLLLoss float16 [64_21_267_326] 255 noncontiguous none bwd 1980538 605503 3.270897089
NLLLoss float16 [32_21_256_256] 255 contiguous none fwd 377577 90399 4.176782929
NLLLoss float16 [32_21_256_256] 255 contiguous none bwd 715747 154416 4.635186768
NLLLoss float16 [32_21_256_256] 255 noncontiguous none fwd 375193 130788 2.868711197
NLLLoss float16 [32_21_256_256] 255 noncontiguous none bwd 720051 178166 4.041461334
NLLLoss float16 [55_21_112_257] 255 contiguous none fwd 241339 71466 3.376976464
NLLLoss float16 [55_21_112_257] 255 contiguous none bwd 455927 122843 3.711460971
NLLLoss float16 [55_21_112_257] 255 noncontiguous none fwd 247467 95128 2.601410731
NLLLoss float16 [55_21_112_257] 255 noncontiguous none bwd 456295 158238 2.883599388
NLLLoss float16 [24_21_512_512] 255 contiguous none fwd 1547971 294289 5.260036903
NLLLoss float16 [24_21_512_512] 255 contiguous none bwd 2458274 543548 4.522643814
NLLLoss float16 [24_21_512_512] 255 noncontiguous none fwd 1549924 692773 2.237275413
NLLLoss float16 [24_21_512_512] 255 noncontiguous none bwd 2461298 1170136 2.103429003
NLLLoss float16 [16_21_512_512] 255 noncontiguous sum fwd 515007 544227 0.946309169
NLLLoss float16 [16_21_512_512] 255 noncontiguous sum bwd 1275629 679031 1.878602008
NLLLoss float16 [64_21_254_333] 255 noncontiguous sum fwd 652399 438811 1.486742584
NLLLoss float16 [64_21_254_333] 255 noncontiguous sum bwd 776494 529771 1.465716319
NLLLoss float16 [64_21_213_331] 255 noncontiguous sum fwd 545551 331494 1.645734161
NLLLoss float16 [64_21_213_331] 255 noncontiguous sum bwd 643598 381899 1.685257097
NLLLoss float16 [64_21_240_332] 255 noncontiguous sum fwd 655326 404637 1.619540477
NLLLoss float16 [64_21_240_332] 255 noncontiguous sum bwd 710942 479328 1.483205655
NLLLoss float16 [64_21_212_320] 255 noncontiguous sum fwd 535919 316681 1.692299191
NLLLoss float16 [64_21_212_320] 255 noncontiguous sum bwd 588047 362688 1.621357751
NLLLoss float16 [64_21_218_333] 255 noncontiguous sum fwd 561151 352492 1.59195386
NLLLoss float16 [64_21_218_333] 255 noncontiguous sum bwd 659471 404015 1.632293355
NLLLoss float16 [64_21_270_333] 255 noncontiguous sum fwd 692127 490305 1.411625417
NLLLoss float16 [64_21_270_333] 255 noncontiguous sum bwd 884462 623080 1.419499904
NLLLoss float16 [64_21_237_329] 255 noncontiguous sum fwd 601838 398759 1.509277534
NLLLoss float16 [64_21_237_329] 255 noncontiguous sum bwd 708975 459503 1.542917021
NLLLoss float16 [64_21_225_246] 255 noncontiguous sum fwd 423295 235461 1.797728711
NLLLoss float16 [64_21_225_246] 255 noncontiguous sum bwd 532335 263282 2.021919463
NLLLoss float16 [64_21_240_292] 255 noncontiguous sum fwd 575871 333611 1.726175096
NLLLoss float16 [64_21_240_292] 255 noncontiguous sum bwd 623455 375868 1.658707312
NLLLoss float16 [64_21_288_303] 255 noncontiguous sum fwd 683230 441630 1.547064285
NLLLoss float16 [64_21_288_303] 255 noncontiguous sum bwd 777983 537060 1.44859606
NLLLoss float16 [64_21_274_275] 255 noncontiguous sum fwd 581791 348283 1.670454774
NLLLoss float16 [64_21_274_275] 255 noncontiguous sum bwd 687679 389313 1.766391053
NLLLoss float16 [64_21_273_322] 255 noncontiguous sum fwd 676431 474720 1.424905207
NLLLoss float16 [64_21_273_322] 255 noncontiguous sum bwd 859294 576320 1.491001527
NLLLoss float16 [64_21_240_320] 255 noncontiguous sum fwd 605343 390117 1.55169603
NLLLoss float16 [64_21_240_320] 255 noncontiguous sum bwd 672943 446651 1.506641651
NLLLoss float16 [64_21_238_269] 255 noncontiguous sum fwd 487567 296820 1.642635267
NLLLoss float16 [64_21_238_269] 255 noncontiguous sum bwd 604031 338829 1.782701599
NLLLoss float16 [64_21_213_326] 255 noncontiguous sum fwd 537567 319505 1.682499491
NLLLoss float16 [64_21_213_326] 255 noncontiguous sum bwd 639967 360644 1.774511707
NLLLoss float16 [64_21_297_333] 255 noncontiguous sum fwd 757167 564007 1.342478019
NLLLoss float16 [64_21_297_333] 255 noncontiguous sum bwd 970286 727191 1.334293191
NLLLoss float16 [64_21_212_303] 255 noncontiguous sum fwd 489663 295791 1.655435764
NLLLoss float16 [64_21_212_303] 255 noncontiguous sum bwd 629135 347543 1.810236431
NLLLoss float16 [64_21_230_335] 255 noncontiguous sum fwd 594815 370157 1.60692625
NLLLoss float16 [64_21_230_335] 255 noncontiguous sum bwd 699903 450798 1.552586746
NLLLoss float16 [64_21_198_257] 255 noncontiguous sum fwd 389359 209462 1.85885268
NLLLoss float16 [64_21_198_257] 255 noncontiguous sum bwd 490159 228324 2.146769503
NLLLoss float16 [64_21_283_320] 255 noncontiguous sum fwd 935598 496238 1.885381611
NLLLoss float16 [64_21_283_320] 255 noncontiguous sum bwd 879231 617742 1.423298076
NLLLoss float16 [64_21_175_333] 255 noncontiguous sum fwd 446271 259342 1.720781825
NLLLoss float16 [64_21_175_333] 255 noncontiguous sum bwd 562399 287820 1.953995553
NLLLoss float16 [64_21_267_326] 255 noncontiguous sum fwd 685007 463703 1.477253759
NLLLoss float16 [64_21_267_326] 255 noncontiguous sum bwd 789407 566435 1.39364093
NLLLoss float16 [32_21_256_256] 255 noncontiguous sum fwd 261600 161129 1.623543869
NLLLoss float16 [32_21_256_256] 255 noncontiguous sum bwd 1019871 163813 6.225824568
NLLLoss float16 [55_21_112_257] 255 noncontiguous sum fwd 195296 123460 1.581856472
NLLLoss float16 [55_21_112_257] 255 noncontiguous sum bwd 239472 145450 1.646421451
NLLLoss float16 [24_21_512_512] 255 noncontiguous sum fwd 770287 756619 1.018064574
NLLLoss float16 [24_21_512_512] 255 noncontiguous sum bwd 2092652 1133773 1.845741608
NLLLoss float16 [16_21_512_512] 255 noncontiguous mean fwd 515728 543328 0.949201955
NLLLoss float16 [16_21_512_512] 255 noncontiguous mean bwd 1272696 676817 1.88041376
NLLLoss float16 [64_21_254_333] 255 noncontiguous mean fwd 651435 438283 1.486334172
NLLLoss float16 [64_21_254_333] 255 noncontiguous mean bwd 780567 530441 1.471543489
NLLLoss float16 [64_21_213_331] 255 noncontiguous mean fwd 544879 332171 1.640356925
NLLLoss float16 [64_21_213_331] 255 noncontiguous mean bwd 643564 382268 1.68354139
NLLLoss float16 [64_21_240_332] 255 noncontiguous mean fwd 655675 404402 1.621344603
NLLLoss float16 [64_21_240_332] 255 noncontiguous mean bwd 708618 479849 1.476752062
NLLLoss float16 [64_21_212_320] 255 noncontiguous mean fwd 535471 317204 1.688096619
NLLLoss float16 [64_21_212_320] 255 noncontiguous mean bwd 591165 365008 1.619594639
NLLLoss float16 [64_21_218_333] 255 noncontiguous mean fwd 561086 352938 1.589757974
NLLLoss float16 [64_21_218_333] 255 noncontiguous mean bwd 662635 401044 1.652275062
NLLLoss float16 [64_21_270_333] 255 noncontiguous mean fwd 691773 489880 1.41212746
NLLLoss float16 [64_21_270_333] 255 noncontiguous mean bwd 880328 623176 1.41264747
NLLLoss float16 [64_21_237_329] 255 noncontiguous mean fwd 601410 398575 1.508900458
NLLLoss float16 [64_21_237_329] 255 noncontiguous mean bwd 708271 461223 1.535636774
NLLLoss float16 [64_21_225_246] 255 noncontiguous mean fwd 425783 236301 1.80186711
NLLLoss float16 [64_21_225_246] 255 noncontiguous mean bwd 533637 261118 2.043662252
NLLLoss float16 [64_21_240_292] 255 noncontiguous mean fwd 580692 333652 1.740412166
NLLLoss float16 [64_21_240_292] 255 noncontiguous mean bwd 640611 373954 1.713074335
NLLLoss float16 [64_21_288_303] 255 noncontiguous mean fwd 684179 442061 1.547702693
NLLLoss float16 [64_21_288_303] 255 noncontiguous mean bwd 801441 542967 1.47603998
NLLLoss float16 [64_21_274_275] 255 noncontiguous mean fwd 582661 352088 1.654873214
NLLLoss float16 [64_21_274_275] 255 noncontiguous mean bwd 690931 398400 1.734264558
NLLLoss float16 [64_21_273_322] 255 noncontiguous mean fwd 675460 473333 1.427029174
NLLLoss float16 [64_21_273_322] 255 noncontiguous mean bwd 858129 576213 1.489256577
NLLLoss float16 [64_21_240_320] 255 noncontiguous mean fwd 605142 388836 1.556291084
NLLLoss float16 [64_21_240_320] 255 noncontiguous mean bwd 672821 441387 1.524333521
NLLLoss float16 [64_21_238_269] 255 noncontiguous mean fwd 499512 296552 1.684399363
NLLLoss float16 [64_21_238_269] 255 noncontiguous mean bwd 602662 339112 1.777176862
NLLLoss float16 [64_21_213_326] 255 noncontiguous mean fwd 537767 319041 1.685573328
NLLLoss float16 [64_21_213_326] 255 noncontiguous mean bwd 638597 359059 1.778529434
NLLLoss float16 [64_21_297_333] 255 noncontiguous mean fwd 758500 565015 1.342442236
NLLLoss float16 [64_21_297_333] 255 noncontiguous mean bwd 968176 728660 1.32870749
NLLLoss float16 [64_21_212_303] 255 noncontiguous mean fwd 491768 293975 1.672822519
NLLLoss float16 [64_21_212_303] 255 noncontiguous mean bwd 614742 351486 1.748980045
NLLLoss float16 [64_21_230_335] 255 noncontiguous mean fwd 594615 370775 1.603708449
NLLLoss float16 [64_21_230_335] 255 noncontiguous mean bwd 700693 454598 1.54134642
NLLLoss float16 [64_21_198_257] 255 noncontiguous mean fwd 392634 209068 1.878020548
NLLLoss float16 [64_21_198_257] 255 noncontiguous mean bwd 490488 227112 2.159674522
NLLLoss float16 [64_21_283_320] 255 noncontiguous mean fwd 740148 496466 1.490833209
NLLLoss float16 [64_21_283_320] 255 noncontiguous mean bwd 869042 613942 1.415511563
NLLLoss float16 [64_21_175_333] 255 noncontiguous mean fwd 446313 257797 1.73125754
NLLLoss float16 [64_21_175_333] 255 noncontiguous mean bwd 562855 290846 1.935233766
NLLLoss float16 [64_21_267_326] 255 noncontiguous mean fwd 687894 461888 1.489309097
NLLLoss float16 [64_21_267_326] 255 noncontiguous mean bwd 789715 558493 1.414010561
NLLLoss float16 [32_21_256_256] 255 noncontiguous mean fwd 259788 161708 1.606525342
NLLLoss float16 [32_21_256_256] 255 noncontiguous mean bwd 1026048 163824 6.263111632
NLLLoss float16 [55_21_112_257] 255 noncontiguous mean fwd 195085 123112 1.58461401
NLLLoss float16 [55_21_112_257] 255 noncontiguous mean bwd 238188 145904 1.632498081
NLLLoss float16 [24_21_512_512] 255 noncontiguous mean fwd 767252 755740 1.015232752
NLLLoss float16 [24_21_512_512] 255 noncontiguous mean bwd 2069663 1137787 1.819025002
NLLLoss float16 2 3 128 128 128 -100 contiguous none fwd 95492 90452 1.055720161
NLLLoss float16 2 3 128 128 128 -100 contiguous none bwd 164775 93385 1.764469669
NLLLoss float16 2 3 128 128 128 -100 noncontiguous none fwd 172297 250431 0.688001885
NLLLoss float16 2 3 128 128 128 -100 noncontiguous none bwd 165708 233578 0.709433251
NLLLoss float16 2 3 128 128 128 -100 contiguous mean fwd 163943 136851 1.197967132
NLLLoss float16 2 3 128 128 128 -100 contiguous mean bwd 198872 86416 2.301333086
NLLLoss float16 2 3 128 128 128 -100 noncontiguous mean fwd 245028 312758 0.78344279
NLLLoss float16 2 3 128 128 128 -100 noncontiguous mean bwd 199280 229258 0.869239023
NLLLoss float16 2 3 128 128 128 -100 contiguous sum fwd 160022 137295 1.165534069
NLLLoss float16 2 3 128 128 128 -100 contiguous sum bwd 197592 86505 2.284168545
NLLLoss float16 2 3 128 128 128 -100 noncontiguous sum fwd 239616 319514 0.74993897
NLLLoss float16 2 3 128 128 128 -100 noncontiguous sum bwd 196415 226680 0.866485795
NLLLoss float16 256 81 8732 -100 contiguous none fwd 396607 167588 2.366559658
NLLLoss float16 256 81 8732 -100 contiguous none bwd 891761 376900 2.36604139
NLLLoss float16 256 81 8732 -100 noncontiguous none fwd 1414219 91216 15.50406727
NLLLoss float16 256 81 8732 -100 noncontiguous none bwd 892265 122469 7.285639631
NLLLoss float16 256 81 8732 -100 contiguous mean fwd 175014 202556 0.864027726
NLLLoss float16 256 81 8732 -100 contiguous mean bwd 770860 374660 2.057492126
NLLLoss float16 256 81 8732 -100 noncontiguous mean fwd 1199184 127517 9.404110824
NLLLoss float16 256 81 8732 -100 noncontiguous mean bwd 769962 108603 7.089693655
NLLLoss float16 256 81 8732 -100 contiguous sum fwd 169237 201632 0.839336018
NLLLoss float16 256 81 8732 -100 contiguous sum bwd 769451 377789 2.036721556
NLLLoss float16 256 81 8732 -100 noncontiguous sum fwd 1194572 127623 9.360162353
NLLLoss float16 256 81 8732 -100 noncontiguous sum bwd 767677 109154 7.032971765
NLLLoss float16 256 100 -100 contiguous none fwd 10161 9298 1.092815659
NLLLoss float16 256 100 -100 contiguous none bwd 14592 8000 1.824
NLLLoss float16 256 100 -100 noncontiguous none fwd 16873 9209 1.832229341
NLLLoss float16 256 100 -100 noncontiguous none bwd 20219 8035 2.516365899
NLLLoss float16 256 100 -100 contiguous mean fwd 15808 15556 1.016199537
NLLLoss float16 256 100 -100 contiguous mean bwd 14352 7058 2.033437234
NLLLoss float16 256 100 -100 noncontiguous mean fwd 19128 15893 1.203548732
NLLLoss float16 256 100 -100 noncontiguous mean bwd 21062 6844 3.077440094
NLLLoss float16 256 100 -100 contiguous sum fwd 15568 15697 0.991781869
NLLLoss float16 256 100 -100 contiguous sum bwd 12752 7058 1.80674412
NLLLoss float16 256 100 -100 noncontiguous sum fwd 18488 16249 1.137793095
NLLLoss float16 256 100 -100 noncontiguous sum bwd 20350 7058 2.883253046
NLLLoss float16 40 2 -100 contiguous none fwd 8688 7449 1.166331051
NLLLoss float16 40 2 -100 contiguous none bwd 14720 7253 2.029505032
NLLLoss float16 40 2 -100 noncontiguous none fwd 11782 7396 1.593023256
NLLLoss float16 40 2 -100 noncontiguous none bwd 17018 7289 2.334750995
NLLLoss float16 40 2 -100 contiguous mean fwd 8256 13511 0.611057657
NLLLoss float16 40 2 -100 contiguous mean bwd 8880 6578 1.349954393
NLLLoss float16 40 2 -100 noncontiguous mean fwd 12102 14329 0.84458092
NLLLoss float16 40 2 -100 noncontiguous mean bwd 14705 6400 2.29765625
NLLLoss float16 40 2 -100 contiguous sum fwd 7776 14187 0.548107422
NLLLoss float16 40 2 -100 contiguous sum bwd 7728 6382 1.210905672
NLLLoss float16 40 2 -100 noncontiguous sum fwd 11985 13689 0.875520491
NLLLoss float16 40 2 -100 noncontiguous sum bwd 13323 6418 2.075880337
NLLLoss float16 8192 52100 -100 contiguous none fwd 28769 14437 1.992727021
NLLLoss float16 8192 52100 -100 contiguous none bwd 784551 14260 55.01760168
NLLLoss float16 8192 52100 -100 noncontiguous none fwd 2435181 14579 167.0334728
NLLLoss float16 8192 52100 -100 noncontiguous none bwd 3194068 14455 220.9663092
NLLLoss float16 8192 52100 -100 contiguous mean fwd 407147 24749 16.45104853
NLLLoss float16 8192 52100 -100 contiguous mean bwd 1029516 12855 80.08681447
NLLLoss float16 8192 52100 -100 noncontiguous mean fwd 2759594 24642 111.9874199
NLLLoss float16 8192 52100 -100 noncontiguous mean bwd 3429560 12783 268.2906986
NLLLoss float16 8192 52100 -100 contiguous sum fwd 2491153 24944 99.86982842
NLLLoss float16 8192 52100 -100 contiguous sum bwd 1029995 12836 80.24267685
NLLLoss float16 8192 52100 -100 noncontiguous sum fwd 2760448 25104 109.9604844
NLLLoss float16 8192 52100 -100 noncontiguous sum bwd 3429032 12961 264.5653885
NLLLoss float16 20480 50000 -100 contiguous none fwd 35521 20285 1.75109687
NLLLoss float16 20480 50000 -100 contiguous none bwd 1852737 21547 85.9858449
NLLLoss float16 20480 50000 -100 noncontiguous none fwd 5804885 19752 293.888467
NLLLoss float16 20480 50000 -100 noncontiguous none bwd 7600988 21049 361.1092213
NLLLoss float16 20480 50000 -100 contiguous mean fwd 989040 22027 44.90125755
NLLLoss float16 20480 50000 -100 contiguous mean bwd 2420295 18898 128.071489
NLLLoss float16 20480 50000 -100 noncontiguous mean fwd 6632306 28800 230.2884028
NLLLoss float16 20480 50000 -100 noncontiguous mean bwd 8166788 19449 419.9078616
NLLLoss float16 20480 50000 -100 contiguous sum fwd 987278 28125 35.10321778
NLLLoss float16 20480 50000 -100 contiguous sum bwd 2418932 19289 125.4047385
NLLLoss float16 20480 50000 -100 noncontiguous sum fwd 6631614 27697 239.4343792
NLLLoss float16 20480 50000 -100 noncontiguous sum bwd 8173236 19164 426.489042
Nllloss float32
op_name dtype size ignore_index contiguous reduction direction rocm_kernel_avg miopen_kernel_time improvement over rocm
NLLLoss float32 [16_21_512_512] 255 contiguous none fwd 998583 272498 3.664551666
NLLLoss float32 [16_21_512_512] 255 contiguous none bwd 1543684 600994 2.568551433
NLLLoss float32 [16_21_512_512] 255 noncontiguous none fwd 996327 564923 1.763650975
NLLLoss float32 [16_21_512_512] 255 noncontiguous none bwd 1548868 828425 1.869653861
NLLLoss float32 [64_21_254_333] 255 contiguous none fwd 1461617 325279 4.493425644
NLLLoss float32 [64_21_254_333] 255 contiguous none bwd 2081951 691963 3.008760584
NLLLoss float32 [64_21_254_333] 255 noncontiguous none fwd 1456849 484248 3.008477061
NLLLoss float32 [64_21_254_333] 255 noncontiguous none bwd 2090832 781438 2.675621099
NLLLoss float32 [64_21_213_331] 255 contiguous none fwd 1228539 273670 4.489125589
NLLLoss float32 [64_21_213_331] 255 contiguous none bwd 1672966 579110 2.888857039
NLLLoss float32 [64_21_213_331] 255 noncontiguous none fwd 1224780 359324 3.408567198
NLLLoss float32 [64_21_213_331] 255 noncontiguous none bwd 1676741 594328 2.821238441
NLLLoss float32 [64_21_240_332] 255 contiguous none fwd 1517298 296835 5.111587245
NLLLoss float32 [64_21_240_332] 255 contiguous none bwd 1995340 645776 3.089833007
NLLLoss float32 [64_21_240_332] 255 noncontiguous none fwd 1514402 445030 3.402921151
NLLLoss float32 [64_21_240_332] 255 noncontiguous none bwd 1978236 718683 2.752584937
NLLLoss float32 [64_21_212_320] 255 contiguous none fwd 1293980 278786 4.64148128
NLLLoss float32 [64_21_212_320] 255 contiguous none bwd 1649668 552187 2.98751691
NLLLoss float32 [64_21_212_320] 255 noncontiguous none fwd 1298509 343066 3.785012213
NLLLoss float32 [64_21_212_320] 255 noncontiguous none bwd 1651108 563742 2.928836241
NLLLoss float32 [64_21_218_333] 255 contiguous none fwd 1262620 280941 4.494253242
NLLLoss float32 [64_21_218_333] 255 contiguous none bwd 1767446 592581 2.982623473
NLLLoss float32 [64_21_218_333] 255 noncontiguous none fwd 1261515 385982 3.268325984
NLLLoss float32 [64_21_218_333] 255 noncontiguous none bwd 1758406 629699 2.792454808
NLLLoss float32 [64_21_270_333] 255 contiguous none fwd 1583442 346628 4.568130676
NLLLoss float32 [64_21_270_333] 255 contiguous none bwd 2216912 736685 3.009307913
NLLLoss float32 [64_21_270_333] 255 noncontiguous none fwd 1587010 554136 2.863935929
NLLLoss float32 [64_21_270_333] 255 noncontiguous none bwd 2213200 883575 2.504824152
NLLLoss float32 [64_21_237_329] 255 contiguous none fwd 1382030 300731 4.595568797
NLLLoss float32 [64_21_237_329] 255 contiguous none bwd 1885315 646243 2.91734688
NLLLoss float32 [64_21_237_329] 255 noncontiguous none fwd 1385036 442253 3.13177299
NLLLoss float32 [64_21_237_329] 255 noncontiguous none bwd 1881984 713173 2.63888846
NLLLoss float32 [64_21_225_246] 255 contiguous none fwd 925105 213555 4.331928543
NLLLoss float32 [64_21_225_246] 255 contiguous none bwd 1300289 454948 2.858104663
NLLLoss float32 [64_21_225_246] 255 noncontiguous none fwd 915312 257375 3.556336085
NLLLoss float32 [64_21_225_246] 255 noncontiguous none bwd 1282592 433385 2.959474832
NLLLoss float32 [64_21_240_292] 255 contiguous none fwd 1242777 259473 4.789619729
NLLLoss float32 [64_21_240_292] 255 contiguous none bwd 1687463 557738 3.025547838
NLLLoss float32 [64_21_240_292] 255 noncontiguous none fwd 1248265 356110 3.505279268
NLLLoss float32 [64_21_240_292] 255 noncontiguous none bwd 1684518 583959 2.884651148
NLLLoss float32 [64_21_288_303] 255 contiguous none fwd 1673182 337997 4.950286541
NLLLoss float32 [64_21_288_303] 255 contiguous none bwd 2164570 734819 2.945718606
NLLLoss float32 [64_21_288_303] 255 noncontiguous none fwd 1703533 490223 3.475016472
NLLLoss float32 [64_21_288_303] 255 noncontiguous none bwd 2186519 780062 2.803006684
NLLLoss float32 [64_21_274_275] 255 contiguous none fwd 1406124 290160 4.846029777
NLLLoss float32 [64_21_274_275] 255 contiguous none bwd 1921637 618558 3.106639959
NLLLoss float32 [64_21_274_275] 255 noncontiguous none fwd 1420923 377215 3.766878305
NLLLoss float32 [64_21_274_275] 255 noncontiguous none bwd 1913316 614541 3.113406591
NLLLoss float32 [64_21_273_322] 255 contiguous none fwd 1597557 338675 4.717079796
NLLLoss float32 [64_21_273_322] 255 contiguous none bwd 2167996 719625 3.012674657
NLLLoss float32 [64_21_273_322] 255 noncontiguous none fwd 1596805 527417 3.027594863
NLLLoss float32 [64_21_273_322] 255 noncontiguous none bwd 2169547 840493 2.581279083
NLLLoss float32 [64_21_240_320] 255 contiguous none fwd 1429974 286766 4.986553497
NLLLoss float32 [64_21_240_320] 255 contiguous none bwd 1861806 624145 2.982970303
NLLLoss float32 [64_21_240_320] 255 noncontiguous none fwd 2241110 423776 5.28843068
NLLLoss float32 [64_21_240_320] 255 noncontiguous none bwd 1861757 669370 2.781357097
NLLLoss float32 [64_21_238_269] 255 contiguous none fwd 1090139 246092 4.429802675
NLLLoss float32 [64_21_238_269] 255 contiguous none bwd 1504515 521375 2.885667706
NLLLoss float32 [64_21_238_269] 255 noncontiguous none fwd 1083899 324544 3.339759786
NLLLoss float32 [64_21_238_269] 255 noncontiguous none bwd 1508515 543259 2.776787867
NLLLoss float32 [64_21_213_326] 255 contiguous none fwd 1202984 267497 4.497186884
NLLLoss float32 [64_21_213_326] 255 contiguous none bwd 1647695 574690 2.867102264
NLLLoss float32 [64_21_213_326] 255 noncontiguous none fwd 1206360 345397 3.492676543
NLLLoss float32 [64_21_213_326] 255 noncontiguous none bwd 1647327 575650 2.861681577
NLLLoss float32 [64_21_297_333] 255 contiguous none fwd 1845434 387975 4.756579677
NLLLoss float32 [64_21_297_333] 255 contiguous none bwd 2477646 827113 2.995535072
NLLLoss float32 [64_21_297_333] 255 noncontiguous none fwd 1878650 626921 2.996629559
NLLLoss float32 [64_21_297_333] 255 noncontiguous none bwd 2474302 972888 2.543254722
NLLLoss float32 [64_21_212_303] 255 contiguous none fwd 1102377 256421 4.299090168
NLLLoss float32 [64_21_212_303] 255 contiguous none bwd 1517361 540737 2.806097974
NLLLoss float32 [64_21_212_303] 255 noncontiguous none fwd 1094537 320012 3.420299864
NLLLoss float32 [64_21_212_303] 255 noncontiguous none bwd 1517553 537555 2.823065547
NLLLoss float32 [64_21_230_335] 255 contiguous none fwd 1409203 297666 4.734175217
NLLLoss float32 [64_21_230_335] 255 contiguous none bwd 1936152 632594 3.060655017
NLLLoss float32 [64_21_230_335] 255 noncontiguous none fwd 1402580 410607 3.415869676
NLLLoss float32 [64_21_230_335] 255 noncontiguous none bwd 1916921 676291 2.834461792
NLLLoss float32 [64_21_198_257] 255 contiguous none fwd 806544 200281 4.027061978
NLLLoss float32 [64_21_198_257] 255 contiguous none bwd 1187064 412580 2.877172912
NLLLoss float32 [64_21_198_257] 255 noncontiguous none fwd 800016 224583 3.562228664
NLLLoss float32 [64_21_198_257] 255 noncontiguous none bwd 1182856 385398 3.069180432
NLLLoss float32 [64_21_283_320] 255 contiguous none fwd 1762525 336244 5.241803571
NLLLoss float32 [64_21_283_320] 255 contiguous none bwd 2278179 734478 3.101766152
NLLLoss float32 [64_21_283_320] 255 noncontiguous none fwd 1757693 539530 3.257822549
NLLLoss float32 [64_21_283_320] 255 noncontiguous none bwd 2284946 851596 2.683133786
NLLLoss float32 [64_21_175_333] 255 contiguous none fwd 875855 226806 3.861692371
NLLLoss float32 [64_21_175_333] 255 contiguous none bwd 1291607 476313 2.711676986
NLLLoss float32 [64_21_175_333] 255 noncontiguous none fwd 862255 276956 3.113328471
NLLLoss float32 [64_21_175_333] 255 noncontiguous none bwd 1290454 465593 2.771635312
NLLLoss float32 [64_21_267_326] 255 contiguous none fwd 1648992 349257 4.721428633
NLLLoss float32 [64_21_267_326] 255 contiguous none bwd 2203893 731279 3.01375125
NLLLoss float32 [64_21_267_326] 255 noncontiguous none fwd 1646576 506660 3.249863814
NLLLoss float32 [64_21_267_326] 255 noncontiguous none bwd 2221669 811331 2.73830163
NLLLoss float32 [32_21_256_256] 255 contiguous none fwd 471623 126877 3.71716702
NLLLoss float32 [32_21_256_256] 255 contiguous none bwd 848928 264547 3.208987439
NLLLoss float32 [32_21_256_256] 255 noncontiguous none fwd 468679 160371 2.922467279
NLLLoss float32 [32_21_256_256] 255 noncontiguous none bwd 848208 262930 3.225984102
NLLLoss float32 [55_21_112_257] 255 contiguous none fwd 267451 110505 2.420261527
NLLLoss float32 [55_21_112_257] 255 contiguous none bwd 517974 207695 2.49391656
NLLLoss float32 [55_21_112_257] 255 noncontiguous none fwd 279291 138131 2.021928459
NLLLoss float32 [55_21_112_257] 255 noncontiguous none bwd 517942 221010 2.343522918
NLLLoss float32 [24_21_512_512] 255 contiguous none fwd 1664177 404422 4.114951709
NLLLoss float32 [24_21_512_512] 255 contiguous none bwd 2476978 895935 2.764684938
NLLLoss float32 [24_21_512_512] 255 noncontiguous none fwd 1645458 782319 2.103308241
NLLLoss float32 [24_21_512_512] 255 noncontiguous none bwd 2504066 1331502 1.880632549
NLLLoss float32 [16_21_512_512] 255 noncontiguous sum fwd 589775 604730 0.975269955
NLLLoss float32 [16_21_512_512] 255 noncontiguous sum bwd 1305245 809744 1.611923028
NLLLoss float32 [64_21_254_333] 255 noncontiguous sum fwd 849326 536917 1.581857159
NLLLoss float32 [64_21_254_333] 255 noncontiguous sum bwd 1178238 718266 1.640392278
NLLLoss float32 [64_21_213_331] 255 noncontiguous sum fwd 708335 406487 1.742577253
NLLLoss float32 [64_21_213_331] 255 noncontiguous sum bwd 989998 542018 1.826503917
NLLLoss float32 [64_21_240_332] 255 noncontiguous sum fwd 710798 494120 1.438512912
NLLLoss float32 [64_21_240_332] 255 noncontiguous sum bwd 1047054 660462 1.58533572
NLLLoss float32 [64_21_212_320] 255 noncontiguous sum fwd 607807 386121 1.574136087
NLLLoss float32 [64_21_212_320] 255 noncontiguous sum bwd 894366 517761 1.727372282
NLLLoss float32 [64_21_218_333] 255 noncontiguous sum fwd 729870 432544 1.687389029
NLLLoss float32 [64_21_218_333] 255 noncontiguous sum bwd 1007790 572489 1.760365701
NLLLoss float32 [64_21_270_333] 255 noncontiguous sum fwd 903406 601804 1.501163169
NLLLoss float32 [64_21_270_333] 255 noncontiguous sum bwd 1252109 823536 1.520405908
NLLLoss float32 [64_21_237_329] 255 noncontiguous sum fwd 784878 488340 1.607236761
NLLLoss float32 [64_21_237_329] 255 noncontiguous sum bwd 1082238 652210 1.659339783
NLLLoss float32 [64_21_225_246] 255 noncontiguous sum fwd 538943 297166 1.813609229
NLLLoss float32 [64_21_225_246] 255 noncontiguous sum bwd 767679 401359 1.912699105
NLLLoss float32 [64_21_240_292] 255 noncontiguous sum fwd 626719 405220 1.546614185
NLLLoss float32 [64_21_240_292] 255 noncontiguous sum bwd 924222 533964 1.730869497
NLLLoss float32 [64_21_288_303] 255 noncontiguous sum fwd 875534 544066 1.609242261
NLLLoss float32 [64_21_288_303] 255 noncontiguous sum bwd 1193854 734748 1.624848247
NLLLoss float32 [64_21_274_275] 255 noncontiguous sum fwd 757726 427785 1.771277628
NLLLoss float32 [64_21_274_275] 255 noncontiguous sum bwd 1047678 565900 1.851348295
NLLLoss float32 [64_21_273_322] 255 noncontiguous sum fwd 895774 580321 1.543583637
NLLLoss float32 [64_21_273_322] 255 noncontiguous sum bwd 1236254 773494 1.598272256
NLLLoss float32 [64_21_240_320] 255 noncontiguous sum fwd 687615 478687 1.436460568
NLLLoss float32 [64_21_240_320] 255 noncontiguous sum bwd 1011038 623754 1.620892211
NLLLoss float32 [64_21_238_269] 255 noncontiguous sum fwd 622574 369372 1.685493216
NLLLoss float32 [64_21_238_269] 255 noncontiguous sum bwd 880159 503773 1.747134126
NLLLoss float32 [64_21_213_326] 255 noncontiguous sum fwd 698159 394458 1.769919738
NLLLoss float32 [64_21_213_326] 255 noncontiguous sum bwd 966814 527881 1.831499902
NLLLoss float32 [64_21_297_333] 255 noncontiguous sum fwd 991582 680151 1.457885087
NLLLoss float32 [64_21_297_333] 255 noncontiguous sum bwd 1355726 916403 1.479399347
NLLLoss float32 [64_21_212_303] 255 noncontiguous sum fwd 624126 365250 1.70876386
NLLLoss float32 [64_21_212_303] 255 noncontiguous sum bwd 886175 503066 1.761548187
NLLLoss float32 [64_21_230_335] 255 noncontiguous sum fwd 775567 462230 1.677881141
NLLLoss float32 [64_21_230_335] 255 noncontiguous sum bwd 1069278 636295 1.680475251
NLLLoss float32 [64_21_198_257] 255 noncontiguous sum fwd 497743 265818 1.872495467
NLLLoss float32 [64_21_198_257] 255 noncontiguous sum bwd 707855 356131 1.98762534
NLLLoss float32 [64_21_283_320] 255 noncontiguous sum fwd 799311 593199 1.347458441
NLLLoss float32 [64_21_283_320] 255 noncontiguous sum bwd 1194974 804667 1.485054066
NLLLoss float32 [64_21_175_333] 255 noncontiguous sum fwd 570831 317880 1.795743677
NLLLoss float32 [64_21_175_333] 255 noncontiguous sum bwd 821439 431668 1.902941613
NLLLoss float32 [64_21_267_326] 255 noncontiguous sum fwd 874366 566738 1.542804612
NLLLoss float32 [64_21_267_326] 255 noncontiguous sum bwd 1239982 758602 1.634561997
NLLLoss float32 [32_21_256_256] 255 noncontiguous sum fwd 288960 187118 1.544266185
NLLLoss float32 [32_21_256_256] 255 noncontiguous sum bwd 1106558 246528 4.48856925
NLLLoss float32 [55_21_112_257] 255 noncontiguous sum fwd 244351 165591 1.475629714
NLLLoss float32 [55_21_112_257] 255 noncontiguous sum bwd 340575 222656 1.529601717
NLLLoss float32 [24_21_512_512] 255 noncontiguous sum fwd 881167 846644 1.040776288
NLLLoss float32 [24_21_512_512] 255 noncontiguous sum bwd 1873869 1302142 1.439066553
NLLLoss float32 [16_21_512_512] 255 noncontiguous mean fwd 588701 605975 0.971493874
NLLLoss float32 [16_21_512_512] 255 noncontiguous mean bwd 1305942 811641 1.60901433
NLLLoss float32 [64_21_254_333] 255 noncontiguous mean fwd 850405 536788 1.584247412
NLLLoss float32 [64_21_254_333] 255 noncontiguous mean bwd 1175563 720446 1.631715632
NLLLoss float32 [64_21_213_331] 255 noncontiguous mean fwd 709146 405147 1.750342468
NLLLoss float32 [64_21_213_331] 255 noncontiguous mean bwd 975121 542247 1.798296717
NLLLoss float32 [64_21_240_332] 255 noncontiguous mean fwd 711002 495156 1.435915146
NLLLoss float32 [64_21_240_332] 255 noncontiguous mean bwd 1049359 661162 1.587143544
NLLLoss float32 [64_21_212_320] 255 noncontiguous mean fwd 607725 389150 1.561672877
NLLLoss float32 [64_21_212_320] 255 noncontiguous mean bwd 894292 516135 1.732670716
NLLLoss float32 [64_21_218_333] 255 noncontiguous mean fwd 730650 432973 1.687518621
NLLLoss float32 [64_21_218_333] 255 noncontiguous mean bwd 1007218 572100 1.760562839
NLLLoss float32 [64_21_270_333] 255 noncontiguous mean fwd 902793 602839 1.497569003
NLLLoss float32 [64_21_270_333] 255 noncontiguous mean bwd 1250752 823103 1.519557091
NLLLoss float32 [64_21_237_329] 255 noncontiguous mean fwd 784846 487125 1.611179882
NLLLoss float32 [64_21_237_329] 255 noncontiguous mean bwd 1081943 654164 1.653932347
NLLLoss float32 [64_21_225_246] 255 noncontiguous mean fwd 544181 296763 1.833722533
NLLLoss float32 [64_21_225_246] 255 noncontiguous mean bwd 769312 398984 1.928177571
NLLLoss float32 [64_21_240_292] 255 noncontiguous mean fwd 627331 405190 1.548239098
NLLLoss float32 [64_21_240_292] 255 noncontiguous mean bwd 922413 535002 1.724130003
NLLLoss float32 [64_21_288_303] 255 noncontiguous mean fwd 875695 547589 1.599182964
NLLLoss float32 [64_21_288_303] 255 noncontiguous mean bwd 1195737 734362 1.628266441
NLLLoss float32 [64_21_274_275] 255 noncontiguous mean fwd 758402 429724 1.764858374
NLLLoss float32 [64_21_274_275] 255 noncontiguous mean bwd 1046317 567502 1.843723899
NLLLoss float32 [64_21_273_322] 255 noncontiguous mean fwd 881969 580196 1.52012251
NLLLoss float32 [64_21_273_322] 255 noncontiguous mean bwd 1230730 774436 1.589195234
NLLLoss float32 [64_21_240_320] 255 noncontiguous mean fwd 690420 477156 1.446948168
NLLLoss float32 [64_21_240_320] 255 noncontiguous mean bwd 1015743 619841 1.638715413
NLLLoss float32 [64_21_238_269] 255 noncontiguous mean fwd 624822 369156 1.692568995
NLLLoss float32 [64_21_238_269] 255 noncontiguous mean bwd 892113 502703 1.774632338
NLLLoss float32 [64_21_213_326] 255 noncontiguous mean fwd 699269 393903 1.775231466
NLLLoss float32 [64_21_213_326] 255 noncontiguous mean bwd 966512 528446 1.828970226
NLLLoss float32 [64_21_297_333] 255 noncontiguous mean fwd 991504 680216 1.457631105
NLLLoss float32 [64_21_297_333] 255 noncontiguous mean bwd 1355450 916377 1.479140136
NLLLoss float32 [64_21_212_303] 255 noncontiguous mean fwd 623894 365744 1.705821558
NLLLoss float32 [64_21_212_303] 255 noncontiguous mean bwd 885426 501121 1.766890631
NLLLoss float32 [64_21_230_335] 255 noncontiguous mean fwd 777700 462740 1.680641397
NLLLoss float32 [64_21_230_335] 255 noncontiguous mean bwd 1069583 636110 1.681443461
NLLLoss float32 [64_21_198_257] 255 noncontiguous mean fwd 499368 265033 1.884172914
NLLLoss float32 [64_21_198_257] 255 noncontiguous mean bwd 705301 354349 1.990413406
NLLLoss float32 [64_21_283_320] 255 noncontiguous mean fwd 807491 593301 1.361014055
NLLLoss float32 [64_21_283_320] 255 noncontiguous mean bwd 1189486 804148 1.479187911
NLLLoss float32 [64_21_175_333] 255 noncontiguous mean fwd 570247 319167 1.786672808
NLLLoss float32 [64_21_175_333] 255 noncontiguous mean bwd 819475 433176 1.891783017
NLLLoss float32 [64_21_267_326] 255 noncontiguous mean fwd 893666 568289 1.572555513
NLLLoss float32 [64_21_267_326] 255 noncontiguous mean bwd 1215629 754441 1.611297636
NLLLoss float32 [32_21_256_256] 255 noncontiguous mean fwd 290299 187948 1.544570839
NLLLoss float32 [32_21_256_256] 255 noncontiguous mean bwd 1102975 246331 4.477613455
NLLLoss float32 [55_21_112_257] 255 noncontiguous mean fwd 243868 165192 1.476270037
NLLLoss float32 [55_21_112_257] 255 noncontiguous mean bwd 341755 222952 1.532863576
NLLLoss float32 [24_21_512_512] 255 noncontiguous mean fwd 880434 847616 1.038718004
NLLLoss float32 [24_21_512_512] 255 noncontiguous mean bwd 1867123 1305291 1.430426625
NLLLoss float32 2 3 128 128 128 -100 contiguous none fwd 108820 95323 1.14159227
NLLLoss float32 2 3 128 128 128 -100 contiguous none bwd 217112 111465 1.947804243
NLLLoss float32 2 3 128 128 128 -100 noncontiguous none fwd 198436 263248 0.753798699
NLLLoss float32 2 3 128 128 128 -100 noncontiguous none bwd 217434 247871 0.877206289
NLLLoss float32 2 3 128 128 128 -100 contiguous mean fwd 167046 140602 1.188076983
NLLLoss float32 2 3 128 128 128 -100 contiguous mean bwd 214152 97456 2.197422427
NLLLoss float32 2 3 128 128 128 -100 noncontiguous mean fwd 262017 324367 0.80777946
NLLLoss float32 2 3 128 128 128 -100 noncontiguous mean bwd 214088 234929 0.911288091
NLLLoss float32 2 3 128 128 128 -100 contiguous sum fwd 164294 140371 1.17042694
NLLLoss float32 2 3 128 128 128 -100 contiguous sum bwd 214184 97349 2.200166412
NLLLoss float32 2 3 128 128 128 -100 noncontiguous sum fwd 258788 312154 0.829039513
NLLLoss float32 2 3 128 128 128 -100 noncontiguous sum bwd 214030 238591 0.897058146
NLLLoss float32 256 81 8732 -100 contiguous none fwd 436751 199890 2.184956726
NLLLoss float32 256 81 8732 -100 contiguous none bwd 1093797 441664 2.476536462
NLLLoss float32 256 81 8732 -100 noncontiguous none fwd 1776074 107589 16.50795156
NLLLoss float32 256 81 8732 -100 noncontiguous none bwd 2382248 180815 13.17505738
NLLLoss float32 256 81 8732 -100 contiguous mean fwd 212087 233525 0.908198266
NLLLoss float32 256 81 8732 -100 contiguous mean bwd 981265 433024 2.266075321
NLLLoss float32 256 81 8732 -100 noncontiguous mean fwd 1544778 142771 10.81997044
NLLLoss float32 256 81 8732 -100 noncontiguous mean bwd 978418 151731 6.448372449
NLLLoss float32 256 81 8732 -100 contiguous sum fwd 207207 233507 0.887369544
NLLLoss float32 256 81 8732 -100 contiguous sum bwd 981088 434980 2.255478413
NLLLoss float32 256 81 8732 -100 noncontiguous sum fwd 1537518 142486 10.79066014
NLLLoss float32 256 81 8732 -100 noncontiguous sum bwd 979988 151571 6.465537603
NLLLoss float32 256 100 -100 contiguous none fwd 9936 9244 1.074859368
NLLLoss float32 256 100 -100 contiguous none bwd 14960 7680 1.947916667
NLLLoss float32 256 100 -100 noncontiguous none fwd 16888 9191 1.837449679
NLLLoss float32 256 100 -100 noncontiguous none bwd 20612 8409 2.451183256
NLLLoss float32 256 100 -100 contiguous mean fwd 15936 15433 1.032592497
NLLLoss float32 256 100 -100 contiguous mean bwd 13505 6774 1.9936522
NLLLoss float32 256 100 -100 noncontiguous mean fwd 19041 15718 1.211413666
NLLLoss float32 256 100 -100 noncontiguous mean bwd 20961 6632 3.160585042
NLLLoss float32 256 100 -100 contiguous sum fwd 15553 15734 0.98849625
NLLLoss float32 256 100 -100 contiguous sum bwd 13073 6952 1.880466053
NLLLoss float32 256 100 -100 noncontiguous sum fwd 18459 15806 1.167847653
NLLLoss float32 256 100 -100 noncontiguous sum bwd 20160 6739 2.991541772
NLLLoss float32 40 2 -100 contiguous none fwd 8416 7449 1.129816083
NLLLoss float32 40 2 -100 contiguous none bwd 14400 7502 1.919488136
NLLLoss float32 40 2 -100 noncontiguous none fwd 11622 7413 1.567786321
NLLLoss float32 40 2 -100 noncontiguous none bwd 16945 7467 2.269318334
NLLLoss float32 40 2 -100 contiguous mean fwd 8464 13920 0.608045977
NLLLoss float32 40 2 -100 contiguous mean bwd 9168 6204 1.477756286
NLLLoss float32 40 2 -100 noncontiguous mean fwd 12480 14044 0.888635716
NLLLoss float32 40 2 -100 noncontiguous mean bwd 14603 6293 2.320514858
NLLLoss float32 40 2 -100 contiguous sum fwd 7216 14293 0.50486252
NLLLoss float32 40 2 -100 contiguous sum bwd 8128 6044 1.344804765
NLLLoss float32 40 2 -100 noncontiguous sum fwd 11738 14044 0.835801766
NLLLoss float32 40 2 -100 noncontiguous sum bwd 13440 6133 2.191423447
NLLLoss float32 8192 52100 -100 contiguous none fwd 33745 15574 2.166752279
NLLLoss float32 8192 52100 -100 contiguous none bwd 1184589 25957 45.63659129
NLLLoss float32 8192 52100 -100 noncontiguous none fwd 3299934 15396 214.3371005
NLLLoss float32 8192 52100 -100 noncontiguous none bwd 4476791 24890 179.8630374
NLLLoss float32 8192 52100 -100 contiguous mean fwd 1735257 33797 51.34352161
NLLLoss float32 8192 52100 -100 contiguous mean bwd 1459906 24659 59.20377955
NLLLoss float32 8192 52100 -100 noncontiguous mean fwd 3642497 34135 106.7085689
NLLLoss float32 8192 52100 -100 noncontiguous mean bwd 4738202 14152 334.8079423
NLLLoss float32 8192 52100 -100 contiguous sum fwd 458394 35771 12.81468228
NLLLoss float32 8192 52100 -100 contiguous sum bwd 1457040 14170 102.8256881
NLLLoss float32 8192 52100 -100 noncontiguous sum fwd 3649286 25565 142.7453941
NLLLoss float32 8192 52100 -100 noncontiguous sum bwd 4741964 24428 194.1200262
NLLLoss float32 20480 50000 -100 contiguous none fwd 42704 23501 1.817114165
NLLLoss float32 20480 50000 -100 contiguous none bwd 2804726 29812 94.08043741
NLLLoss float32 20480 50000 -100 noncontiguous none fwd 7850643 23217 338.142008
NLLLoss float32 20480 50000 -100 noncontiguous none bwd 10698810 23591 453.5123564
NLLLoss float32 20480 50000 -100 contiguous mean fwd 1125945 31590 35.64245014
NLLLoss float32 20480 50000 -100 contiguous mean bwd 3437690 27963 122.9370954
NLLLoss float32 20480 50000 -100 noncontiguous mean fwd 8752699 37706 232.1301384
NLLLoss float32 20480 50000 -100 noncontiguous mean bwd 11695705 28764 406.6091295
NLLLoss float32 20480 50000 -100 contiguous sum fwd 1121600 35430 31.65678803
NLLLoss float32 20480 50000 -100 contiguous sum bwd 3445887 26844 128.3671211
NLLLoss float32 20480 50000 -100 noncontiguous sum fwd 8711305 32585 267.3409544
NLLLoss float32 20480 50000 -100 noncontiguous sum bwd 11648068 26008 447.8648108
Nllloss bfloat16
op_name dtype size ignore_index contiguous reduction direction rocm_kernel_avg miopen_kernel_time improvement over rocm
NLLLoss bfloat16 [16_21_512_512] 255 contiguous none fwd 891301 194631 4.579440069
NLLLoss bfloat16 [16_21_512_512] 255 contiguous none bwd 1436993 371946 3.863445231
NLLLoss bfloat16 [16_21_512_512] 255 noncontiguous none fwd 889589 500353 1.777922787
NLLLoss bfloat16 [16_21_512_512] 255 noncontiguous none bwd 1440529 684550 2.10434446
NLLLoss bfloat16 [64_21_254_333] 255 contiguous none fwd 1311214 223626 5.863423752
NLLLoss bfloat16 [64_21_254_333] 255 contiguous none bwd 1826042 426097 4.28550776
NLLLoss bfloat16 [64_21_254_333] 255 noncontiguous none fwd 1310094 379199 3.454898352
NLLLoss bfloat16 [64_21_254_333] 255 noncontiguous none bwd 1824665 582256 3.133784796
NLLLoss bfloat16 [64_21_213_331] 255 contiguous none fwd 1101801 184889 5.959256635
NLLLoss bfloat16 [64_21_213_331] 255 contiguous none bwd 1516514 365332 4.151057121
NLLLoss bfloat16 [64_21_213_331] 255 noncontiguous none fwd 1111721 277475 4.006562753
NLLLoss bfloat16 [64_21_213_331] 255 noncontiguous none bwd 1510322 420106 3.595097428
NLLLoss bfloat16 [64_21_240_332] 255 contiguous none fwd 1374686 209773 6.553207515
NLLLoss bfloat16 [64_21_240_332] 255 contiguous none bwd 1792920 398338 4.501001662
NLLLoss bfloat16 [64_21_240_332] 255 noncontiguous none fwd 1361070 348680 3.903493174
NLLLoss bfloat16 [64_21_240_332] 255 noncontiguous none bwd 1785368 518870 3.440877291
NLLLoss bfloat16 [64_21_212_320] 255 contiguous none fwd 1132329 193425 5.854098488
NLLLoss bfloat16 [64_21_212_320] 255 contiguous none bwd 1499585 336702 4.453745448
NLLLoss bfloat16 [64_21_212_320] 255 noncontiguous none fwd 1134489 266202 4.261759867
NLLLoss bfloat16 [64_21_212_320] 255 noncontiguous none bwd 1501793 389286 3.857814049
NLLLoss bfloat16 [64_21_218_333] 255 contiguous none fwd 1140008 190440 5.986179374
NLLLoss bfloat16 [64_21_218_333] 255 contiguous none bwd 1582738 363549 4.353575447
NLLLoss bfloat16 [64_21_218_333] 255 noncontiguous none fwd 1137961 297829 3.820853577
NLLLoss bfloat16 [64_21_218_333] 255 noncontiguous none bwd 1592499 449552 3.542413336
NLLLoss bfloat16 [64_21_270_333] 255 contiguous none fwd 1483838 234955 6.31541359
NLLLoss bfloat16 [64_21_270_333] 255 contiguous none bwd 2009448 449663 4.468786625
NLLLoss bfloat16 [64_21_270_333] 255 noncontiguous none fwd 1477033 428669 3.445625879
NLLLoss bfloat16 [64_21_270_333] 255 noncontiguous none bwd 2016082 659430 3.057310101
NLLLoss bfloat16 [64_21_237_329] 255 contiguous none fwd 1249832 203848 6.131195793
NLLLoss bfloat16 [64_21_237_329] 255 contiguous none bwd 1722699 392354 4.390675258
NLLLoss bfloat16 [64_21_237_329] 255 noncontiguous none fwd 1248198 340837 3.662155224
NLLLoss bfloat16 [64_21_237_329] 255 noncontiguous none bwd 1725448 517646 3.333258636
NLLLoss bfloat16 [64_21_225_246] 255 contiguous none fwd 832399 145984 5.701987889
NLLLoss bfloat16 [64_21_225_246] 255 contiguous none bwd 1182239 270210 4.375259983
NLLLoss bfloat16 [64_21_225_246] 255 noncontiguous none fwd 822718 191832 4.288742233
NLLLoss bfloat16 [64_21_225_246] 255 noncontiguous none bwd 1183438 282868 4.18371113
NLLLoss bfloat16 [64_21_240_292] 255 contiguous none fwd 1125128 182891 6.151904686
NLLLoss bfloat16 [64_21_240_292] 255 contiguous none bwd 1548005 353248 4.38220457
NLLLoss bfloat16 [64_21_240_292] 255 noncontiguous none fwd 1145191 281056 4.074600791
NLLLoss bfloat16 [64_21_240_292] 255 noncontiguous none bwd 1553396 413478 3.756901214
NLLLoss bfloat16 [64_21_288_303] 255 contiguous none fwd 1579565 230695 6.846984113
NLLLoss bfloat16 [64_21_288_303] 255 contiguous none bwd 2040840 442083 4.616418184
NLLLoss bfloat16 [64_21_288_303] 255 noncontiguous none fwd 1578764 382974 4.122379065
NLLLoss bfloat16 [64_21_288_303] 255 noncontiguous none bwd 2041446 575732 3.545826878
NLLLoss bfloat16 [64_21_274_275] 255 contiguous none fwd 1293404 197185 6.559342749
NLLLoss bfloat16 [64_21_274_275] 255 contiguous none bwd 1722390 378869 4.546136
NLLLoss bfloat16 [64_21_274_275] 255 noncontiguous none fwd 1283692 295174 4.348933172
NLLLoss bfloat16 [64_21_274_275] 255 noncontiguous none bwd 1713030 434120 3.945982678
NLLLoss bfloat16 [64_21_273_322] 255 contiguous none fwd 1484342 230892 6.428728583
NLLLoss bfloat16 [64_21_273_322] 255 contiguous none bwd 1978638 446335 4.433078293
NLLLoss bfloat16 [64_21_273_322] 255 noncontiguous none fwd 1463958 413482 3.540560411
NLLLoss bfloat16 [64_21_273_322] 255 noncontiguous none bwd 1977101 628268 3.146907052
NLLLoss bfloat16 [64_21_240_320] 255 contiguous none fwd 1308136 207959 6.29035531
NLLLoss bfloat16 [64_21_240_320] 255 contiguous none bwd 1717039 390124 4.401264726
NLLLoss bfloat16 [64_21_240_320] 255 noncontiguous none fwd 1305879 334214 3.907313877
NLLLoss bfloat16 [64_21_240_320] 255 noncontiguous none bwd 1715776 490460 3.498299556
NLLLoss bfloat16 [64_21_238_269] 255 contiguous none fwd 1017900 165419 6.153464838
NLLLoss bfloat16 [64_21_238_269] 255 contiguous none bwd 1391045 314447 4.423782068
NLLLoss bfloat16 [64_21_238_269] 255 noncontiguous none fwd 1007692 247212 4.076226073
NLLLoss bfloat16 [64_21_238_269] 255 noncontiguous none bwd 1389701 370481 3.751072255
NLLLoss bfloat16 [64_21_213_326] 255 contiguous none fwd 1070602 181579 5.896067277
NLLLoss bfloat16 [64_21_213_326] 255 contiguous none bwd 1499410 360259 4.162033426
NLLLoss bfloat16 [64_21_213_326] 255 noncontiguous none fwd 1065402 265790 4.008435231
NLLLoss bfloat16 [64_21_213_326] 255 noncontiguous none bwd 1495538 391014 3.824768423
NLLLoss bfloat16 [64_21_297_333] 255 contiguous none fwd 1779036 263497 6.751636641
NLLLoss bfloat16 [64_21_297_333] 255 contiguous none bwd 2316529 521146 4.445067217
NLLLoss bfloat16 [64_21_297_333] 255 noncontiguous none fwd 1799691 499760 3.601110533
NLLLoss bfloat16 [64_21_297_333] 255 noncontiguous none bwd 2315601 763470 3.032995403
NLLLoss bfloat16 [64_21_212_303] 255 contiguous none fwd 1012636 168939 5.994092542
NLLLoss bfloat16 [64_21_212_303] 255 contiguous none bwd 1381220 312475 4.420257621
NLLLoss bfloat16 [64_21_212_303] 255 noncontiguous none fwd 1006540 243765 4.129140771
NLLLoss bfloat16 [64_21_212_303] 255 noncontiguous none bwd 1380276 366127 3.769937754
NLLLoss bfloat16 [64_21_230_335] 255 contiguous none fwd 1218807 201366 6.052695093
NLLLoss bfloat16 [64_21_230_335] 255 contiguous none bwd 1690414 387460 4.362809064
NLLLoss bfloat16 [64_21_230_335] 255 noncontiguous none fwd 1237591 314555 3.934418464
NLLLoss bfloat16 [64_21_230_335] 255 noncontiguous none bwd 1689230 483370 3.494693506
NLLLoss bfloat16 [64_21_198_257] 255 contiguous none fwd 704946 136691 5.157223226
NLLLoss bfloat16 [64_21_198_257] 255 contiguous none bwd 1075050 255072 4.214692322
NLLLoss bfloat16 [64_21_198_257] 255 noncontiguous none fwd 699554 164869 4.243089968
NLLLoss bfloat16 [64_21_198_257] 255 noncontiguous none bwd 1077147 242965 4.433342251
NLLLoss bfloat16 [64_21_283_320] 255 contiguous none fwd 1656751 239730 6.910903934
NLLLoss bfloat16 [64_21_283_320] 255 contiguous none bwd 2113478 462073 4.573904989
NLLLoss bfloat16 [64_21_283_320] 255 noncontiguous none fwd 1633936 434820 3.757729635
NLLLoss bfloat16 [64_21_283_320] 255 noncontiguous none bwd 2116694 645466 3.279326874
NLLLoss bfloat16 [64_21_175_333] 255 contiguous none fwd 809440 151767 5.333438758
NLLLoss bfloat16 [64_21_175_333] 255 contiguous none bwd 1175465 297293 3.95389397
NLLLoss bfloat16 [64_21_175_333] 255 noncontiguous none fwd 782289 213739 3.660019931
NLLLoss bfloat16 [64_21_175_333] 255 noncontiguous none bwd 1177001 309667 3.800860279
NLLLoss bfloat16 [64_21_267_326] 255 contiguous none fwd 1476068 232654 6.344477206
NLLLoss bfloat16 [64_21_267_326] 255 contiguous none bwd 1982890 448865 4.417564301
NLLLoss bfloat16 [64_21_267_326] 255 noncontiguous none fwd 1467348 402039 3.649765321
NLLLoss bfloat16 [64_21_267_326] 255 noncontiguous none bwd 1980362 604881 3.273969591
NLLLoss bfloat16 [32_21_256_256] 255 contiguous none fwd 379865 90239 4.209543545
NLLLoss bfloat16 [32_21_256_256] 255 contiguous none bwd 714098 159073 4.489121347
NLLLoss bfloat16 [32_21_256_256] 255 noncontiguous none fwd 374649 130700 2.86648049
NLLLoss bfloat16 [32_21_256_256] 255 noncontiguous none bwd 719282 179340 4.010717074
NLLLoss bfloat16 [55_21_112_257] 255 contiguous none fwd 243068 71146 3.416467546
NLLLoss bfloat16 [55_21_112_257] 255 contiguous none bwd 455880 123963 3.677548946
NLLLoss bfloat16 [55_21_112_257] 255 noncontiguous none fwd 247819 94488 2.622756329
NLLLoss bfloat16 [55_21_112_257] 255 noncontiguous none bwd 458424 157758 2.905868482
NLLLoss bfloat16 [24_21_512_512] 255 contiguous none fwd 1549300 293063 5.286576606
NLLLoss bfloat16 [24_21_512_512] 255 contiguous none bwd 2463139 553522 4.449938756
NLLLoss bfloat16 [24_21_512_512] 255 noncontiguous none fwd 1547188 690107 2.241953784
NLLLoss bfloat16 [24_21_512_512] 255 noncontiguous none bwd 2457636 1165105 2.109368684
NLLLoss bfloat16 [16_21_512_512] 255 noncontiguous sum fwd 513999 543995 0.944859787
NLLLoss bfloat16 [16_21_512_512] 255 noncontiguous sum bwd 1254477 666530 1.882101331
NLLLoss bfloat16 [64_21_254_333] 255 noncontiguous sum fwd 651071 441317 1.475291004
NLLLoss bfloat16 [64_21_254_333] 255 noncontiguous sum bwd 783310 542233 1.444600384
NLLLoss bfloat16 [64_21_213_331] 255 noncontiguous sum fwd 544943 334000 1.631565868
NLLLoss bfloat16 [64_21_213_331] 255 noncontiguous sum bwd 647871 383462 1.689531166
NLLLoss bfloat16 [64_21_240_332] 255 noncontiguous sum fwd 655310 404352 1.620642411
NLLLoss bfloat16 [64_21_240_332] 255 noncontiguous sum bwd 711870 477335 1.491342558
NLLLoss bfloat16 [64_21_212_320] 255 noncontiguous sum fwd 535535 318413 1.681887988
NLLLoss bfloat16 [64_21_212_320] 255 noncontiguous sum bwd 592207 357940 1.654486785
NLLLoss bfloat16 [64_21_218_333] 255 noncontiguous sum fwd 560239 353171 1.586310881
NLLLoss bfloat16 [64_21_218_333] 255 noncontiguous sum bwd 666815 408652 1.631742901
NLLLoss bfloat16 [64_21_270_333] 255 noncontiguous sum fwd 691167 490557 1.408943303
NLLLoss bfloat16 [64_21_270_333] 255 noncontiguous sum bwd 888078 607139 1.462725998
NLLLoss bfloat16 [64_21_237_329] 255 noncontiguous sum fwd 600974 399881 1.502882107
NLLLoss bfloat16 [64_21_237_329] 255 noncontiguous sum bwd 712767 472644 1.508041994
NLLLoss bfloat16 [64_21_225_246] 255 noncontiguous sum fwd 422815 236866 1.785038798
NLLLoss bfloat16 [64_21_225_246] 255 noncontiguous sum bwd 536767 259870 2.065521222
NLLLoss bfloat16 [64_21_240_292] 255 noncontiguous sum fwd 578527 334288 1.730624491
NLLLoss bfloat16 [64_21_240_292] 255 noncontiguous sum bwd 624943 377985 1.653353969
NLLLoss bfloat16 [64_21_288_303] 255 noncontiguous sum fwd 685151 441098 1.553285211
NLLLoss bfloat16 [64_21_288_303] 255 noncontiguous sum bwd 786158 538147 1.460861066
NLLLoss bfloat16 [64_21_274_275] 255 noncontiguous sum fwd 580191 346061 1.676557023
NLLLoss bfloat16 [64_21_274_275] 255 noncontiguous sum bwd 693199 401119 1.728162964
NLLLoss bfloat16 [64_21_273_322] 255 noncontiguous sum fwd 675534 475574 1.420460328
NLLLoss bfloat16 [64_21_273_322] 255 noncontiguous sum bwd 868878 573353 1.515432901
NLLLoss bfloat16 [64_21_240_320] 255 noncontiguous sum fwd 604623 388731 1.55537634
NLLLoss bfloat16 [64_21_240_320] 255 noncontiguous sum bwd 675471 450012 1.50100664
NLLLoss bfloat16 [64_21_238_269] 255 noncontiguous sum fwd 487359 297354 1.638985855
NLLLoss bfloat16 [64_21_238_269] 255 noncontiguous sum bwd 605679 342670 1.767528526
NLLLoss bfloat16 [64_21_213_326] 255 noncontiguous sum fwd 536895 318368 1.686397502
NLLLoss bfloat16 [64_21_213_326] 255 noncontiguous sum bwd 643471 362244 1.776346882
NLLLoss bfloat16 [64_21_297_333] 255 noncontiguous sum fwd 757454 564648 1.341462292
NLLLoss bfloat16 [64_21_297_333] 255 noncontiguous sum bwd 975694 716739 1.361296092
NLLLoss bfloat16 [64_21_212_303] 255 noncontiguous sum fwd 488831 295827 1.652421855
NLLLoss bfloat16 [64_21_212_303] 255 noncontiguous sum bwd 616191 345161 1.785227763
NLLLoss bfloat16 [64_21_230_335] 255 noncontiguous sum fwd 593599 369730 1.605493198
NLLLoss bfloat16 [64_21_230_335] 255 noncontiguous sum bwd 705583 450159 1.567408405
NLLLoss bfloat16 [64_21_198_257] 255 noncontiguous sum fwd 396640 209871 1.889922857
NLLLoss bfloat16 [64_21_198_257] 255 noncontiguous sum bwd 496303 228289 2.174011888
NLLLoss bfloat16 [64_21_283_320] 255 noncontiguous sum fwd 739887 496781 1.489362516
NLLLoss bfloat16 [64_21_283_320] 255 noncontiguous sum bwd 871711 613217 1.421537563
NLLLoss bfloat16 [64_21_175_333] 255 noncontiguous sum fwd 445343 260303 1.710863878
NLLLoss bfloat16 [64_21_175_333] 255 noncontiguous sum bwd 567695 287180 1.976791559
NLLLoss bfloat16 [64_21_267_326] 255 noncontiguous sum fwd 670335 463066 1.447601422
NLLLoss bfloat16 [64_21_267_326] 255 noncontiguous sum bwd 797710 566028 1.409311907
NLLLoss bfloat16 [32_21_256_256] 255 noncontiguous sum fwd 278608 161520 1.724913323
NLLLoss bfloat16 [32_21_256_256] 255 noncontiguous sum bwd 1012126 161484 6.267655
NLLLoss bfloat16 [55_21_112_257] 255 noncontiguous sum fwd 197520 123051 1.605188093
NLLLoss bfloat16 [55_21_112_257] 255 noncontiguous sum bwd 245552 144330 1.701323356
NLLLoss bfloat16 [24_21_512_512] 255 noncontiguous sum fwd 765486 756658 1.011667094
NLLLoss bfloat16 [24_21_512_512] 255 noncontiguous sum bwd 1951034 1137849 1.714668642
NLLLoss bfloat16 [16_21_512_512] 255 noncontiguous mean fwd 514528 544680 0.944642726
NLLLoss bfloat16 [16_21_512_512] 255 noncontiguous mean bwd 1230089 667699 1.842280728
NLLLoss bfloat16 [64_21_254_333] 255 noncontiguous mean fwd 649979 440150 1.476721572
NLLLoss bfloat16 [64_21_254_333] 255 noncontiguous mean bwd 782903 541090 1.446899776
NLLLoss bfloat16 [64_21_213_331] 255 noncontiguous mean fwd 544239 332883 1.634925785
NLLLoss bfloat16 [64_21_213_331] 255 noncontiguous mean bwd 649180 384455 1.688572135
NLLLoss bfloat16 [64_21_240_332] 255 noncontiguous mean fwd 655851 405789 1.616236517
NLLLoss bfloat16 [64_21_240_332] 255 noncontiguous mean bwd 711498 476419 1.493429103
NLLLoss bfloat16 [64_21_212_320] 255 noncontiguous mean fwd 535999 318289 1.684001018
NLLLoss bfloat16 [64_21_212_320] 255 noncontiguous mean bwd 592974 359053 1.651494348
NLLLoss bfloat16 [64_21_218_333] 255 noncontiguous mean fwd 561632 353081 1.5906605
NLLLoss bfloat16 [64_21_218_333] 255 noncontiguous mean bwd 666589 407889 1.634241178
NLLLoss bfloat16 [64_21_270_333] 255 noncontiguous mean fwd 692383 491729 1.408058097
NLLLoss bfloat16 [64_21_270_333] 255 noncontiguous mean bwd 892250 609061 1.464959996
NLLLoss bfloat16 [64_21_237_329] 255 noncontiguous mean fwd 601331 399588 1.504877524
NLLLoss bfloat16 [64_21_237_329] 255 noncontiguous mean bwd 717552 472939 1.517218923
NLLLoss bfloat16 [64_21_225_246] 255 noncontiguous mean fwd 424407 236408 1.795231126
NLLLoss bfloat16 [64_21_225_246] 255 noncontiguous mean bwd 537445 260994 2.059223584
NLLLoss bfloat16 [64_21_240_292] 255 noncontiguous mean fwd 577877 333972 1.730315715
NLLLoss bfloat16 [64_21_240_292] 255 noncontiguous mean bwd 624356 379110 1.646899317
NLLLoss bfloat16 [64_21_288_303] 255 noncontiguous mean fwd 685283 441937 1.550635045
NLLLoss bfloat16 [64_21_288_303] 255 noncontiguous mean bwd 790402 537350 1.47092584
NLLLoss bfloat16 [64_21_274_275] 255 noncontiguous mean fwd 581094 349546 1.662424974
NLLLoss bfloat16 [64_21_274_275] 255 noncontiguous mean bwd 693988 397582 1.745521679
NLLLoss bfloat16 [64_21_273_322] 255 noncontiguous mean fwd 676404 476125 1.420643739
NLLLoss bfloat16 [64_21_273_322] 255 noncontiguous mean bwd 870545 572320 1.521080864
NLLLoss bfloat16 [64_21_240_320] 255 noncontiguous mean fwd 604534 388961 1.554227802
NLLLoss bfloat16 [64_21_240_320] 255 noncontiguous mean bwd 679365 452196 1.502368442
NLLLoss bfloat16 [64_21_238_269] 255 noncontiguous mean fwd 488280 298650 1.634957308
NLLLoss bfloat16 [64_21_238_269] 255 noncontiguous mean bwd 613062 343663 1.783904581
NLLLoss bfloat16 [64_21_213_326] 255 noncontiguous mean fwd 535976 319450 1.677808734
NLLLoss bfloat16 [64_21_213_326] 255 noncontiguous mean bwd 642166 363894 1.764706206
NLLLoss bfloat16 [64_21_297_333] 255 noncontiguous mean fwd 758116 567309 1.336336987
NLLLoss bfloat16 [64_21_297_333] 255 noncontiguous mean bwd 982576 718030 1.368433074
NLLLoss bfloat16 [64_21_212_303] 255 noncontiguous mean fwd 489112 297548 1.64380873
NLLLoss bfloat16 [64_21_212_303] 255 noncontiguous mean bwd 620198 345228 1.796488118
NLLLoss bfloat16 [64_21_230_335] 255 noncontiguous mean fwd 594503 372784 1.594765333
NLLLoss bfloat16 [64_21_230_335] 255 noncontiguous mean bwd 708709 452091 1.56762466
NLLLoss bfloat16 [64_21_198_257] 255 noncontiguous mean fwd 391514 210010 1.864263606
NLLLoss bfloat16 [64_21_198_257] 255 noncontiguous mean bwd 494776 228019 2.169889351
NLLLoss bfloat16 [64_21_283_320] 255 noncontiguous mean fwd 742260 497941 1.490658532
NLLLoss bfloat16 [64_21_283_320] 255 noncontiguous mean bwd 871971 613657 1.420941992
NLLLoss bfloat16 [64_21_175_333] 255 noncontiguous mean fwd 444745 260268 1.708796318
NLLLoss bfloat16 [64_21_175_333] 255 noncontiguous mean bwd 568759 287806 1.976188822
NLLLoss bfloat16 [64_21_267_326] 255 noncontiguous mean fwd 669862 463932 1.443879707
NLLLoss bfloat16 [64_21_267_326] 255 noncontiguous mean bwd 796579 568271 1.40175902
NLLLoss bfloat16 [32_21_256_256] 255 noncontiguous mean fwd 260348 160606 1.621035329
NLLLoss bfloat16 [32_21_256_256] 255 noncontiguous mean bwd 1003296 161459 6.213936665
NLLLoss bfloat16 [55_21_112_257] 255 noncontiguous mean fwd 195629 123894 1.579003019
NLLLoss bfloat16 [55_21_112_257] 255 noncontiguous mean bwd 240604 145779 1.650470918
NLLLoss bfloat16 [24_21_512_512] 255 noncontiguous mean fwd 767348 756664 1.014119874
NLLLoss bfloat16 [24_21_512_512] 255 noncontiguous mean bwd 1953938 1142196 1.710685381
NLLLoss bfloat16 2 3 128 128 128 -100 contiguous none fwd 102052 93705 1.089077424
NLLLoss bfloat16 2 3 128 128 128 -100 contiguous none bwd 169734 98362 1.725605417
NLLLoss bfloat16 2 3 128 128 128 -100 noncontiguous none fwd 177912 260066 0.684103266
NLLLoss bfloat16 2 3 128 128 128 -100 noncontiguous none bwd 168646 240547 0.701093757
NLLLoss bfloat16 2 3 128 128 128 -100 contiguous mean fwd 169607 139197 1.218467352
NLLLoss bfloat16 2 3 128 128 128 -100 contiguous mean bwd 199783 88958 2.245812631
NLLLoss bfloat16 2 3 128 128 128 -100 noncontiguous mean fwd 252039 311443 0.809262048
NLLLoss bfloat16 2 3 128 128 128 -100 noncontiguous mean bwd 200953 213276 0.942220409
NLLLoss bfloat16 2 3 128 128 128 -100 contiguous sum fwd 166566 139286 1.195856009
NLLLoss bfloat16 2 3 128 128 128 -100 contiguous sum bwd 200568 89065 2.251928367
NLLLoss bfloat16 2 3 128 128 128 -100 noncontiguous sum fwd 247093 313985 0.786957976
NLLLoss bfloat16 2 3 128 128 128 -100 noncontiguous sum bwd 200342 238840 0.838812594
NLLLoss bfloat16 256 81 8732 -100 contiguous none fwd 394940 164851 2.395739183
NLLLoss bfloat16 256 81 8732 -100 contiguous none bwd 895980 377203 2.375325753
NLLLoss bfloat16 256 81 8732 -100 noncontiguous none fwd 1411041 91287 15.45719544
NLLLoss bfloat16 256 81 8732 -100 noncontiguous none bwd 4962643 122825 40.40417667
NLLLoss bfloat16 256 81 8732 -100 contiguous mean fwd 208918 202219 1.033127451
NLLLoss bfloat16 256 81 8732 -100 contiguous mean bwd 771240 373861 2.062905732
NLLLoss bfloat16 256 81 8732 -100 noncontiguous mean fwd 1238918 128317 9.655135329
NLLLoss bfloat16 256 81 8732 -100 noncontiguous mean bwd 769085 109474 7.025275408
NLLLoss bfloat16 256 81 8732 -100 contiguous sum fwd 205830 202895 1.01446561
NLLLoss bfloat16 256 81 8732 -100 contiguous sum bwd 770375 374110 2.05922055
NLLLoss bfloat16 256 81 8732 -100 noncontiguous sum fwd 1233389 128122 9.626676137
NLLLoss bfloat16 256 81 8732 -100 noncontiguous sum bwd 769259 109687 7.013219433
NLLLoss bfloat16 256 100 -100 contiguous none fwd 10224 9264 1.103626943
NLLLoss bfloat16 256 100 -100 contiguous none bwd 15056 7948 1.894313035
NLLLoss bfloat16 256 100 -100 noncontiguous none fwd 16291 9210 1.768838219
NLLLoss bfloat16 256 100 -100 noncontiguous none bwd 21135 7841 2.695447009
NLLLoss bfloat16 256 100 -100 contiguous mean fwd 15760 16126 0.977303733
NLLLoss bfloat16 256 100 -100 contiguous mean bwd 15968 6881 2.320592937
NLLLoss bfloat16 256 100 -100 noncontiguous mean fwd 19171 15860 1.208764187
NLLLoss bfloat16 256 100 -100 noncontiguous mean bwd 21891 6970 3.140746055
NLLLoss bfloat16 256 100 -100 contiguous sum fwd 15488 16411 0.943757236
NLLLoss bfloat16 256 100 -100 contiguous sum bwd 14864 6934 2.143640035
NLLLoss bfloat16 256 100 -100 noncontiguous sum fwd 19070 15878 1.201032876
NLLLoss bfloat16 256 100 -100 noncontiguous sum bwd 20670 6827 3.027684195
NLLLoss bfloat16 40 2 -100 contiguous none fwd 8560 7307 1.171479403
NLLLoss bfloat16 40 2 -100 contiguous none bwd 14528 7396 1.96430503
NLLLoss bfloat16 40 2 -100 noncontiguous none fwd 10996 7467 1.47261283
NLLLoss bfloat16 40 2 -100 noncontiguous none bwd 16000 7182 2.227791701
NLLLoss bfloat16 40 2 -100 contiguous mean fwd 8544 13973 0.611464968
NLLLoss bfloat16 40 2 -100 contiguous mean bwd 10416 6382 1.632090254
NLLLoss bfloat16 40 2 -100 noncontiguous mean fwd 11927 13102 0.910319035
NLLLoss bfloat16 40 2 -100 noncontiguous mean bwd 15098 6471 2.333178798
NLLLoss bfloat16 40 2 -100 contiguous sum fwd 6960 13600 0.511764706
NLLLoss bfloat16 40 2 -100 contiguous sum bwd 8944 6524 1.370938075
NLLLoss bfloat16 40 2 -100 noncontiguous sum fwd 11665 13529 0.862221894
NLLLoss bfloat16 40 2 -100 noncontiguous sum bwd 14516 6684 2.171753441
NLLLoss bfloat16 8192 52100 -100 contiguous none fwd 28177 14276 1.973732138
NLLLoss bfloat16 8192 52100 -100 contiguous none bwd 784960 14738 53.26095807
NLLLoss bfloat16 8192 52100 -100 noncontiguous none fwd 2432283 14347 169.5325155
NLLLoss bfloat16 8192 52100 -100 noncontiguous none bwd 3191033 14632 218.0859076
NLLLoss bfloat16 8192 52100 -100 contiguous mean fwd 409576 24605 16.64604755
NLLLoss bfloat16 8192 52100 -100 contiguous mean bwd 1033941 12925 79.9954352
NLLLoss bfloat16 8192 52100 -100 noncontiguous mean fwd 2758091 24783 111.289634
NLLLoss bfloat16 8192 52100 -100 noncontiguous mean bwd 3430700 13085 262.1857088
NLLLoss bfloat16 8192 52100 -100 contiguous sum fwd 408184 24516 16.64969816
NLLLoss bfloat16 8192 52100 -100 contiguous sum bwd 1037012 12960 80.01635802
NLLLoss bfloat16 8192 52100 -100 noncontiguous sum fwd 2758670 24641 111.9544661
NLLLoss bfloat16 8192 52100 -100 noncontiguous sum bwd 3435032 12711 270.2408937
NLLLoss bfloat16 20480 50000 -100 contiguous none fwd 35152 20267 1.734445157
NLLLoss bfloat16 20480 50000 -100 contiguous none bwd 1854038 21280 87.12584586
NLLLoss bfloat16 20480 50000 -100 noncontiguous none fwd 5798155 20355 284.8516335
NLLLoss bfloat16 20480 50000 -100 noncontiguous none bwd 7598346 20728 366.5740062
NLLLoss bfloat16 20480 50000 -100 contiguous mean fwd 998743 28462 35.09040124
NLLLoss bfloat16 20480 50000 -100 contiguous mean bwd 2439899 20017 121.8913424
NLLLoss bfloat16 20480 50000 -100 noncontiguous mean fwd 6649970 27786 239.3280789
NLLLoss bfloat16 20480 50000 -100 noncontiguous mean bwd 8159293 18880 432.1659428
NLLLoss bfloat16 20480 50000 -100 contiguous sum fwd 992261 28800 34.45350694
NLLLoss bfloat16 20480 50000 -100 contiguous sum bwd 3298188 19715 167.29333
NLLLoss bfloat16 20480 50000 -100 noncontiguous sum fwd 6624722 28978 228.6121195
NLLLoss bfloat16 20480 50000 -100 noncontiguous sum bwd 8167933 20533 397.7954025
  • Average over all cases:

Contiguous :

type Forward Backward
float16 5.27 4.09
float32 4.03 2.82
bfloat16 5.28 4.03

Non-Contiguous :

type Forward Backward
float16 3.84 3.51
float32 3.49 2.98
bfloat16 3.83 4.53

Reduction:

type Forward Backward
float16 1.66 2.00
float32 1.71 1.97
bfloat16 1.66 2.03

This result does not include some instances where MIOpen significantly outperforms ROCm in cases with a large number of classes:

Input size = [8192 52100] (N, C)
op_name dtype contiguous reduction ignore index direction ROCm pytorch MIOpen HIP Improvement
NLLLoss float16 true none -100 fwd 28769 14437 1.99
NLLLoss float16 true none -100 bwd 784551 14260 55.01
NLLLoss float16 false none -100 fwd 2435181 14579 167.03
NLLLoss float16 false none -100 bwd 3194068 14455 220.96
NLLLoss float16 false mean -100 fwd 2759594 24642 111.98
NLLLoss float16 false mean -100 bwd 3429560 12783 268.29
NLLLoss float16 false sum -100 fwd 2760448 25104 109.96
NLLLoss float16 false sum -100 bwd 3429032 12961 264.56
NLLLoss float32 true none -100 fwd 33745 15574 2.16
NLLLoss float32 true none -100 bwd 1184589 25957 45.63
NLLLoss float32 false none -100 fwd 3299934 15396 214.33
NLLLoss float32 false none -100 bwd 4476791 24890 179.86
NLLLoss float32 false mean -100 fwd 3642497 34135 106.70
NLLLoss float32 false mean -100 bwd 4738202 14152 334.80
NLLLoss float32 false sum -100 fwd 3649286 25565 142.74
NLLLoss float32 false sum -100 bwd 4741964 24428 194.12
NLLLoss bfloat16 true none -100 fwd 28177 14276 1.97
NLLLoss bfloat16 true none -100 bwd 784960 14738 53.26
NLLLoss bfloat16 false none -100 fwd 2432283 14347 169.53
NLLLoss bfloat16 false none -100 bwd 3191033 14632 218.08
NLLLoss bfloat16 false mean -100 fwd 2758091 24783 111.28
NLLLoss bfloat16 false mean -100 bwd 3430700 13085 262.18
NLLLoss bfloat16 false sum -100 fwd 2758670 24641 111.95
NLLLoss bfloat16 false sum -100 bwd 3435032 12711 270.24
Input size = [20480 50000] (N, C)
op_name dtype contiguous reduction ignore index direction ROCm pytorch MIOpen HIP Improvement
NLLLoss float16 true none -100 fwd 35521 20285 1.75
NLLLoss float16 true none -100 bwd 1852737 21547 85.98
NLLLoss float16 false none -100 fwd 5804885 19752 293.88
NLLLoss float16 false none -100 bwd 7600988 21049 361.10
NLLLoss float16 false mean -100 fwd 6632306 28800 230.28
NLLLoss float16 false mean -100 bwd 8166788 19449 419.90
NLLLoss float16 false sum -100 fwd 6631614 27697 239.43
NLLLoss float16 false sum -100 bwd 8173236 19164 426.48
NLLLoss float32 true none -100 fwd 42704 23501 1.81
NLLLoss float32 true none -100 bwd 2804726 29812 94.08
NLLLoss float32 false none -100 fwd 7850643 23217 338.14
NLLLoss float32 false none -100 bwd 10698810 23591 453.51
NLLLoss float32 false mean -100 fwd 8752699 37706 232.13
NLLLoss float32 false mean -100 bwd 11695705 28764 406.60
NLLLoss float32 false sum -100 fwd 8711305 32585 267.34
NLLLoss float32 false sum -100 bwd 11648068 26008 447.86
NLLLoss bfloat16 true none -100 fwd 35152 20267 1.73
NLLLoss bfloat16 true none -100 bwd 1854038 21280 87.12
NLLLoss bfloat16 false none -100 fwd 5798155 20355 284.85
NLLLoss bfloat16 false none -100 bwd 7598346 20728 366.57
NLLLoss bfloat16 false mean -100 fwd 6649970 27786 239.32
NLLLoss bfloat16 false mean -100 bwd 8159293 18880 432.16
NLLLoss bfloat16 false sum -100 fwd 6624722 28978 228.61
NLLLoss bfloat16 false sum -100 bwd 8167933 20533 397.79

@hieule88 hieule88 marked this pull request as ready for review October 1, 2024 04:13
@@ -38,3 +38,4 @@ The MIOpen API library is structured as follows:
* :doc:`RotaryPositionalEmbeddings <../doxygen/html/group__RotaryPositionalEmbeddings>` (experimental)
* :doc:`ReLU <../doxygen/html/group___re_l_u>` (experimental)
* :doc:`GLU <../doxygen/html/group__glu>` (experimental)
* :doc:`NLLLoss<../doxygen/html/group__nllloss>` (experimental)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a space between NLLLoss and <...>. It should be * :doc:`NLLLoss <../doxygen/html/group__nllloss>` (experimental)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it

@hieule88 hieule88 requested a review from iq136boy October 3, 2024 10:57
Copy link
Contributor

@iq136boy iq136boy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failed log:
3168_error_log.txt

@@ -40,4 +40,3 @@ do
"$format" -i -style=file "$file"
fi
done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an unexpected modification?

@long10024070 long10024070 marked this pull request as draft December 2, 2024 06:27
@hieule88 hieule88 marked this pull request as ready for review December 20, 2024 10:07
@hieule88 hieule88 marked this pull request as draft December 20, 2024 10:26
@hieule88
Copy link
Collaborator Author

I found a bug, need to fix it before re-open

@hieule88 hieule88 marked this pull request as ready for review December 23, 2024 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants