Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix benchmarker csv format, time and memory results output at same time + Emscripten O2 #248

Merged
merged 1 commit into from
Jun 18, 2024

Conversation

vorosl
Copy link
Contributor

@vorosl vorosl commented May 13, 2024

No description provided.

@vorosl vorosl changed the title Fix benchmarker csv format, support time and memory output together Fix benchmarker csv format, support time and memory output at same time May 13, 2024
@vorosl vorosl changed the title Fix benchmarker csv format, support time and memory output at same time Fix benchmarker csv format, time and memory results output at same time May 13, 2024
Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

test/wasmBenchmarker/benchmark.py Show resolved Hide resolved
@vorosl
Copy link
Contributor Author

vorosl commented May 17, 2024

I've added -O3 flag to Emscripten

@vorosl vorosl requested a review from zherczeg May 17, 2024 09:25
@vorosl vorosl changed the title Fix benchmarker csv format, time and memory results output at same time Fix benchmarker csv format, time and memory results output at same time + Emscripten O3 May 17, 2024
@@ -149,6 +150,7 @@ def compile_tests(emcc_path, path, only_game, only_simd, compile_anyway, run, ve
flags = "-msimd128" if file.startswith("simd") else ""
flags += (" -s WASM=1 -s EXPORTED_FUNCTIONS=_runtime"
" -s EXPORTED_RUNTIME_METHODS=ccall,cwrap"
" -O3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add opt level 3 here?
AFAIK level 2 (-O2) option is commonly used, and Walrus is also built with -O2 in release mode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly for testing. With -O3, tests are optimized as best as possible, which pushes the limits of register allocation as far as possible (most things are cached in locals). But -O2 can be used as well. Which one you prefer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem with -O3 option, but I think that -O2 would be enough because its generally used and safe.

BTW this Opt option is for Emscripten which generates WASM files from c/c++ source codes.
Will different Opt levels (-O2, -O3) affect the output WASM code significantly?
Do you have any idea? (I'm just curious about it)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will check it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interpreter

  Test                   O2 interp   O3 interp   O2/O3 interp
  ---------------------- ----------- ----------- -------------------
  change                 3.418       3.595       0.95076495132128
  fannkuch               0.314       0.326       0.96319018404908
  fibonacci              3.508       4.108       0.853943524829601
  gregory                3.362       4.874       0.689782519491178
  hanoi                  5.605       6.513       0.860586519269154
  heapsort               3.285       3.902       0.841875961045618
  kNucleotide            3.103       3.381       0.917775805974564
  mandelbrotDouble       8.905       6.823       1.30514436464898
  mandelbrotFloat        8.675       5.469       1.58621320168221
  matrixMultiply         8.955       10.069      0.889363392591121
  miniWalrus             1.579       1.710       0.923391812865497
  nbody                  5.842       4.303       1.35765744829189
  nqueens                4.484       5.229       0.85752533945305
  prime                  2.968       3.691       0.804118125169331
  quickSort              3.406       4.344       0.784069981583794
  redBlack               4.558       4.832       0.943294701986755
  salesman               3.965       4.002       0.990754622688656
  simdMandelbrotDouble   3.385       3.757       0.900984828320468
  simdMandelbrotFloat    3.429       3.721       0.921526471378662
  simdMatrixMultiply     3.416       4.346       0.786010124252186
  simdNbody              4.897       3.394       1.4428403064231
  ticTacToe              3.948       4.157       0.949723358191003

JIT

Test                   O2 jit   O3 jit   O2/O3 jit
---------------------- -------- -------- -------------------
change                 1.014    0.998    1.01603206412826
fannkuch               0.048    0.047    1.02127659574468
fibonacci              1.806    1.890    0.955555555555556
gregory                1.275    1.111    1.14761476147615
hanoi                  0.957    1.048    0.913167938931298
heapsort               0.531    0.541    0.981515711645102
kNucleotide            0.275    0.317    0.867507886435331
mandelbrotDouble       1.884    1.713    1.09982486865149
mandelbrotFloat        1.716    1.533    1.11937377690802
matrixMultiply         0.709    0.728    0.973901098901099
miniWalrus             0.122    0.154    0.792207792207792
nbody                  0.721    0.689    1.04644412191582
nqueens                0.451    0.469    0.961620469083156
prime                  0.148    0.135    1.0962962962963
quickSort              0.290    0.329    0.881458966565349
redBlack               0.789    0.867    0.910034602076125
salesman               0.484    0.479    1.01043841336117
simdMandelbrotDouble   0.180    0.172    1.04651162790698
simdMandelbrotFloat    0.138    0.140    0.985714285714286
simdMatrixMultiply     0.434    0.436    0.995412844036697
simdNbody              0.321    0.334    0.961077844311377
ticTacToe              0.886    0.885    1.00112994350283

Codesize

  Test                   O2 codesize   O3 codesize   O2/O3 codesize
  ---------------------- ------------- ------------- -------------------
  change                 6906          6619          1.04336002417284
  fannkuch               7891          7621          1.035428421467
  fibonacci              6702          6415          1.04473889321902
  gregory                10102         9815          1.02924095771778
  hanoi                  7901          7716          1.02397615344738
  heapsort               7756          7476          1.0374531835206
  kNucleotide            14583         14282         1.0210754796247
  mandelbrotDouble       7122          8345          0.853445176752546
  mandelbrotFloat        7058          8145          0.866543891958257
  matrixMultiply         11265         10973         1.02661077189465
  miniWalrus             8977          8683          1.03385926523091
  nbody                  12327         13167         0.936204146730463
  nqueens                8002          7715          1.03720025923526
  prime                  6833          6546          1.04384356859151
  quickSort              9985          9683          1.03118868119384
  redBlack               10833         10512         1.03053652968037
  salesman               7924          8331          0.951146320969872
  simdMandelbrotDouble   7384          7097          1.04043962237565
  simdMandelbrotFloat    7558          7271          1.03947187457021
  simdMatrixMultiply     11130         10845         1.02627939142462
  simdNbody              12440         12459         0.998474997993418
  ticTacToe              8046          7761          1.03672207189795

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the analysis.
It looks like that opt level does not affect the overall code quality and size significantly 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we use? O2 or O3?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO O2 would be enough because this opt level is commonly adopted.

Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests with x (exec) rights enabled, could you remove this x flag?

"kNucleotide": 1,
"mandelbrotFloat": 775007,
"mandelbrotDouble": 775007,
"kNucleotide": 0, # 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bool? Could we have a better check for successful run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a bool, but the check function is in the test. So, the test runs the algorithm, and after that it checks the values.

}

uint64_t runtime() {
return fibonacci(26);
return fibonacci(39);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the result fit into 64 bit? Fibonacci generates pretty big numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 63245986 fits into 64 bits.

@@ -316,4 +315,88 @@ const char *input_3 =
"tctgtcccagaacagctccacaagtttttttacagccgaaacccctgtgtgaatcttaat"
"atccaagcgcgttatctgattagagtttacaactcagtattttatcagtacgttttgttt"
"ccaacattacccggtatgacaaaatgacgccacgtgtcgaataatggtctgaccaatgta"
"ggaagtgaaaagataaatat"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a random addition? This code is a human dna :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is random. I'll look after another human DNA-s

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe running the original in a loop could be easier. I think even the test concatenates the input multiple times.

return retVal;
}

int main() {
int main()
{
printf("%u\n", runtime());
return 0;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline at the end.

@@ -228,22 +228,134 @@ int runtime() {
"te eum quod blandit. Vim ad legimus intellegebat "
"disputationi, an ridens nonumes deterruisset sed. Eos "
"dicunt assentior eu, mel ne case postulant, "
"quando feugiat voluptaria eam ne.\n";
"quando feugiat voluptaria eam ne.\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a duplication, or the continuation of lorem ipsum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generated Lorem Ipsum, and copied it from a part. I think there is some repetitive parts in lipsum, for example, the Lorem ipsum dolor sit amet, but I've searched Integer, and at the beginning of text, there is no word integer

@zherczeg
Copy link
Collaborator

x86 64 bit results on my machine:

        Test case        |     interpreter  |    baseline_jit  |    regalloc_jit  |
-------------------------+------------------+------------------+------------------+
                 change  |           5.42s  |   2.14s (2.54x)  |   1.98s (2.74x)  |
              factorial  |           6.48s  |   1.28s (5.07x)  |  0.38s (17.21x)  |
               fannkuch  |           5.93s  |   1.38s (4.29x)  |   0.75s (7.89x)  |
              fibonacci  |           5.44s  |   3.37s (1.61x)  |   3.25s (1.67x)  |
                gregory  |           5.25s  |   3.19s (1.65x)  |   1.80s (2.91x)  |
                  hanoi  |           8.07s  |   2.20s (3.67x)  |   1.49s (5.43x)  |
               heapsort  |           5.56s  |   1.61s (3.46x)  |   0.73s (7.65x)  |
                huffman  |           4.41s  |   1.24s (3.55x)  |   0.50s (8.77x)  |
            kNucleotide  |           4.89s  |   1.62s (3.02x)  |  0.47s (10.31x)  |
        mandelbrotFloat  |         100.53s  |  10.62s (9.46x)  |  3.11s (32.30x)  |
       mandelbrotDouble  |           7.49s  |   4.76s (1.57x)  |   2.35s (3.19x)  |
         matrixMultiply  |          10.18s  |   2.13s (4.78x)  |  0.67s (15.26x)  |
             miniWalrus  |           2.63s  |   0.62s (4.26x)  |   0.28s (9.50x)  |
                  nbody  |           5.42s  |   1.16s (4.68x)  |   0.66s (8.27x)  |
                nqueens  |           6.34s  |   1.29s (4.90x)  |   0.78s (8.17x)  |
                  prime  |           5.11s  |   1.62s (3.15x)  |  0.25s (20.45x)  |
              quickSort  |           5.82s  |   1.53s (3.81x)  |   0.62s (9.33x)  |
               redBlack  |           0.01s  |   0.01s (0.69x)  |   0.01s (0.62x)  |
                    rsa  |           6.30s  |   3.47s (1.81x)  |   2.69s (2.34x)  |
               salesman  |           6.13s  |   1.24s (4.96x)  |  0.59s (10.44x)  |
    simdMandelbrotFloat  |           6.11s  |   1.50s (4.09x)  |  0.36s (16.89x)  |
   simdMandelbrotDouble  |           5.78s  |   1.28s (4.51x)  |  0.43s (13.54x)  |
              simdNbody  |           4.76s  |   1.10s (4.33x)  |  0.37s (13.01x)  |
     simdMatrixMultiply  |           5.33s  |   1.75s (3.04x)  |   0.69s (7.70x)  |
              ticTacToe  |           5.87s  |   1.73s (3.40x)  |   1.25s (4.70x)  |
-------------------------+------------------+------------------+------------------+
     Average speedup     |                  |           3.69x  |           9.61x  |

Most tests run 4-6s with interpreter, so that is pretty good. Few runs faster, but some are much slower, e.g. mandelbrotFloat is 100s. It seems redblack does not work at all, even with interpreter. Probably #242 hit again.

I will check 32 bit next.

@zherczeg
Copy link
Collaborator

32 bit results, needs #252

        Test case        |     interpreter  |    baseline_jit  |    regalloc_jit  |
-------------------------+------------------+------------------+------------------+
                 change  |           7.46s  |   2.95s (2.53x)  |   2.45s (3.05x)  |
              factorial  |           8.40s  |   1.50s (5.61x)  |  0.64s (13.21x)  |
               fannkuch  |           7.11s  |   1.68s (4.23x)  |   0.75s (9.46x)  |
              fibonacci  |           8.69s  |   5.18s (1.68x)  |   5.12s (1.70x)  |
                gregory  |           9.43s  |   3.61s (2.61x)  |   2.61s (3.61x)  |
                  hanoi  |          10.10s  |   3.05s (3.31x)  |   2.42s (4.17x)  |
               heapsort  |           6.57s  |   1.60s (4.11x)  |   1.00s (6.60x)  |
                huffman  |           5.10s  |   1.09s (4.68x)  |   0.58s (8.74x)  |
            kNucleotide  |           6.05s  |   1.61s (3.76x)  |  0.38s (16.10x)  |
        mandelbrotFloat  |          97.47s  |  10.95s (8.90x)  |  3.74s (26.09x)  |
       mandelbrotDouble  |          14.93s  |   7.23s (2.06x)  |   5.10s (2.92x)  |
         matrixMultiply  |          11.98s  |   1.91s (6.26x)  |  0.87s (13.84x)  |
             miniWalrus  |           3.17s  |   0.46s (6.86x)  |   0.39s (8.07x)  |
                  nbody  |           6.73s  |   1.44s (4.67x)  |   0.75s (8.98x)  |
                nqueens  |           7.94s  |   1.33s (5.99x)  |   0.87s (9.09x)  |
                  prime  |           6.33s  |   1.76s (3.59x)  |   0.97s (6.54x)  |
              quickSort  |           6.96s  |   1.30s (5.36x)  |  0.67s (10.41x)  |
               redBlack  |           0.01s  |   0.01s (0.67x)  |   0.02s (0.64x)  |
                    rsa  |           7.62s  |   4.72s (1.62x)  |   4.40s (1.73x)  |
               salesman  |           7.55s  |   1.31s (5.77x)  |   0.89s (8.51x)  |
    simdMandelbrotFloat  |           7.65s  |   1.26s (6.06x)  |  0.30s (25.41x)  |
   simdMandelbrotDouble  |          11.30s  |  1.05s (10.81x)  |  0.55s (20.58x)  |
              simdNbody  |           5.83s  |   1.23s (4.75x)  |  0.45s (12.80x)  |
     simdMatrixMultiply  |           7.75s  |   1.58s (4.90x)  |  0.51s (15.31x)  |
              ticTacToe  |           7.59s  |   1.71s (4.43x)  |   1.67s (4.54x)  |
-------------------------+------------------+------------------+------------------+
     Average speedup     |                  |           4.61x  |           9.68x  |

@vlacko0930 vlacko0930 force-pushed the benchmarker branch 7 times, most recently from 00afc32 to 367c2eb Compare June 3, 2024 08:32
@vorosl
Copy link
Contributor Author

vorosl commented Jun 3, 2024

It depends on #255

@vlacko0930 vlacko0930 force-pushed the benchmarker branch 2 times, most recently from 236883c to bbb9ad1 Compare June 5, 2024 06:27
emcc_path = None

if system_emcc and os.system("emcc --version >/dev/null") == 0:
if (verbose): print("Emscripten already installed on system")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a common style?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line 103? The prints are in same style in this file.

if os.getenv("EMSDK"):
emcc_path = join(os.getenv("EMSDK"), "upstream/emscripten/emcc.py")
if os.path.exists(emcc_path):
if (verbose): print(f"EMCC already installed: {emcc_path}")
return emcc_path


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these extra newlines?

}

const char *input_3 =
/* ">THREE Homo sapiens frequency" */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this line is removed?

}


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add this newline

@clover2123
Copy link
Collaborator

BTW who is the author among @vorosl @vlacko0930 ?

@vorosl
Copy link
Contributor Author

vorosl commented Jun 11, 2024

BTW who is the author among @vorosl @vlacko0930 ?

@vlacko0930 is my private user, but the git client mixing unfurtunately. :(
Maybe I've fixed it yet.

Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, only a few style changes. Please squash commits and improve title (O3 -> O2)

@@ -105,20 +113,20 @@ def get_emcc(verbose):
if os.path.exists("./emsdk/.git"):
os.system("(cd ./emsdk && git fetch -a) >/dev/null")
os.system("(cd ./emsdk && git reset --hard origin/HEAD) >/dev/null")
os.system("./emsdk/emsdk install latest >/dev/null")
os.system("./emsdk/emsdk activate latest >/dev/null")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this newline needed?


return fibonacci(n - 2);
return fibonacci(n-1) + fibonacci(n-2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why space is removed around dashes?

@@ -18,6 +18,9 @@
#include <stdint.h>
#include <stdbool.h>


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this newline.

@vorosl vorosl changed the title Fix benchmarker csv format, time and memory results output at same time + Emscripten O3 Fix benchmarker csv format, time and memory results output at same time + Emscripten O2 Jun 12, 2024
@vorosl vorosl force-pushed the benchmarker branch 5 times, most recently from cea1235 to 924ab76 Compare June 14, 2024 07:15
@vorosl
Copy link
Contributor Author

vorosl commented Jun 14, 2024

31ec7d6 commit causes the problem at simdNbody

@vorosl
Copy link
Contributor Author

vorosl commented Jun 14, 2024

31ec7d6 commit causes the problem at simdNbody

It is fixed in #266

Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clover2123 clover2123 merged commit c9e4efd into Samsung:main Jun 18, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants