Parallelize the binary parsing of function bodies #7134

tlively · 2024-12-04T03:28:05Z

After a linear scan through the code section and input source map to
find the start locations corresponding to each function body, parse the
locals and instructions for each function in parallel.

This speeds up binary parsing with a sourcemap by about 20% with 8 cores
on my machine, but only by about 2% with all 128 cores, so this
parallelization has potential but suffers from scaling overhead¹.

When running a full -O3 optimization pipeline, the parallel parsing
slightly reduces the number of minor page faults, presumably by better
allocating the original instructions in separate thread-local arenas. It
also slightly reduces the max RSS, but these improvements do not
translate into better overall performance. In fact, overall performance
is slightly lower with this change (at least on my machine.)

FWIW the full -O3 pipeline also performs significantly better with
8 cores than with 128 cores on my machine. ↩

After a linear scan through the code section and input source map to find the start locations corresponding to each function body, parse the locals and instructions for each function in parallel. This speeds up binary parsing with a sourcemap by about 20% with 8 cores on my machine, but only by about 2% with all 128 cores, so this parallelization has potential but suffers from scaling overhead[^1]. When running a full -O3 optimization pipeline, the parallel parsing slightly reduces the number of minor page faults, presumably by better allocating the original instructions in separate thread-local arenas. It also slightly reduces the max RSS, but these improvements do not translate into better overall performance. In fact, overall performance is slightly lower with this change (at least on my machine.) [^1]: FWIW the full -O3 pipeline also performs significantly better with 8 cores than with 128 cores on my machine.

tlively · 2024-12-09T22:21:33Z

After #7142, the parallelization here shows much better improvements for just parsing, but it still slows down a full -O3 pipeline by a few percent, so we probably shouldn't land it.

kripken · 2024-12-09T22:39:04Z

Interesting, what is the reason for the slowdown? I'd expect optimizations to be unaffected, naively.

If this is a massive speedup for parse, and only a tiny slowdown for optimization, it could still be worthwhile (to help e.g. emscripten-core/emscripten#23034)

tlively · 2024-12-09T23:09:47Z

Parsing speed changes by -22% to +6% depending on how many cores are used. But parsing is generally so quick that I think it makes more sense to optimize for the optimization pipeline being fast.

The only possible explanation (that I have thought of!) for optimization performance differences is memory layout. I would have expected the parallel parser to have a more efficient memory layout with the IR for each function allocated in the local arena for the thread that operates on that function, but I guess that's not the case. Perhaps the fragmentation hurts memory locality for global optimizations more than it helps memory locality for function-parallel optimizations.

kripken · 2024-12-10T17:31:38Z

Interesting. Yeah, layout seems like the plausible explanation. I wouldn't have guessed it beforehand, but I suppose the simple memory layout that we get in the single-core parser ends up efficient somehow, as you say. Given that, I agree this likely isn't worth landing. Too bad, but good investigation!

tlively marked this pull request as draft December 4, 2024 03:28

kripken mentioned this pull request Dec 4, 2024

Extremely long link time emscripten-core/emscripten#23034

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize the binary parsing of function bodies #7134

Parallelize the binary parsing of function bodies #7134

tlively commented Dec 4, 2024

tlively commented Dec 9, 2024

kripken commented Dec 9, 2024

tlively commented Dec 9, 2024

kripken commented Dec 10, 2024

Parallelize the binary parsing of function bodies #7134

Are you sure you want to change the base?

Parallelize the binary parsing of function bodies #7134

Conversation

tlively commented Dec 4, 2024

Footnotes

tlively commented Dec 9, 2024

kripken commented Dec 9, 2024

tlively commented Dec 9, 2024

kripken commented Dec 10, 2024