Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize the binary parsing of function bodies #7134

Draft
wants to merge 1 commit into
base: binary-reader-pass-pos
Choose a base branch
from

Conversation

tlively
Copy link
Member

@tlively tlively commented Dec 4, 2024

After a linear scan through the code section and input source map to
find the start locations corresponding to each function body, parse the
locals and instructions for each function in parallel.

This speeds up binary parsing with a sourcemap by about 20% with 8 cores
on my machine, but only by about 2% with all 128 cores, so this
parallelization has potential but suffers from scaling overhead1.

When running a full -O3 optimization pipeline, the parallel parsing
slightly reduces the number of minor page faults, presumably by better
allocating the original instructions in separate thread-local arenas. It
also slightly reduces the max RSS, but these improvements do not
translate into better overall performance. In fact, overall performance
is slightly lower with this change (at least on my machine.)

Footnotes

  1. FWIW the full -O3 pipeline also performs significantly better with
    8 cores than with 128 cores on my machine.

After a linear scan through the code section and input source map to
find the start locations corresponding to each function body, parse the
locals and instructions for each function in parallel.

This speeds up binary parsing with a sourcemap by about 20% with 8 cores
on my machine, but only by about 2% with all 128 cores, so this
parallelization has potential but suffers from scaling overhead[^1].

When running a full -O3 optimization pipeline, the parallel parsing
slightly reduces the number of minor page faults, presumably by better
allocating the original instructions in separate thread-local arenas. It
also slightly reduces the max RSS, but these improvements do not
translate into better overall performance. In fact, overall performance
is slightly lower with this change (at least on my machine.)

[^1]: FWIW the full -O3 pipeline also performs significantly better with
8 cores than with 128 cores on my machine.
@tlively
Copy link
Member Author

tlively commented Dec 9, 2024

After #7142, the parallelization here shows much better improvements for just parsing, but it still slows down a full -O3 pipeline by a few percent, so we probably shouldn't land it.

@kripken
Copy link
Member

kripken commented Dec 9, 2024

Interesting, what is the reason for the slowdown? I'd expect optimizations to be unaffected, naively.

If this is a massive speedup for parse, and only a tiny slowdown for optimization, it could still be worthwhile (to help e.g. emscripten-core/emscripten#23034)

@tlively
Copy link
Member Author

tlively commented Dec 9, 2024

Parsing speed changes by -22% to +6% depending on how many cores are used. But parsing is generally so quick that I think it makes more sense to optimize for the optimization pipeline being fast.

The only possible explanation (that I have thought of!) for optimization performance differences is memory layout. I would have expected the parallel parser to have a more efficient memory layout with the IR for each function allocated in the local arena for the thread that operates on that function, but I guess that's not the case. Perhaps the fragmentation hurts memory locality for global optimizations more than it helps memory locality for function-parallel optimizations.

@kripken
Copy link
Member

kripken commented Dec 10, 2024

Interesting. Yeah, layout seems like the plausible explanation. I wouldn't have guessed it beforehand, but I suppose the simple memory layout that we get in the single-core parser ends up efficient somehow, as you say. Given that, I agree this likely isn't worth landing. Too bad, but good investigation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants