-
Notifications
You must be signed in to change notification settings - Fork 24
Trade interpreter footprint for performance pre-computing a side table #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
This is the first step towards adding a value stack in `Thread`. The next step is to merge `locals` and `labels_values` in `Frame`. After that, we can create a value stack in `Thread` by combining the values from the `Frame`'s. google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
Continuation of google#598 towards adding a value stack in `Thread`. google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
Reverted google#605 and keep values separate from locals for now. google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
The first step towards building the side table. The next step is to build the side table in `push_label()`, `pop_label()`, etc. google#46 Co-authored-by: Zhou Fang <[email protected]>
Make validation return side tables, and store them in module. google#46 --------- Co-authored-by: Zhou Fang <[email protected]>
google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
There was an earlier review in zhouwfang#2. google#46 --------- Co-authored-by: Julien Cretin <[email protected]> Co-authored-by: Zhou Fang <[email protected]>
Jumping from "if without else" should be 1 step less than jumping from "if with else". For "if without else", we don't want to jump over the "end" because `exit_label()` need to be [called](https://github.com/zhouwfang/wasefire/blob/de6c7e889865736cb1f11877d5d4e1e9f50729a8/crates/interpreter/src/exec.rs#L788) at the "end". But for "if with else", we want to jump over the "else" because otherwise we would be forced to jump to the "end" at "else" (normally, we would reach "else" after executing the true branch of "if", and we should not execute the "else" branch). This is equivalent to https://github.com/google/wasefire/blob/main/crates/interpreter/src/parser.rs#L566-L569, which will be deprecated with the side table applied in `exec.rs`. google#46 --------- Co-authored-by: Zhou Fang <[email protected]>
When "if" has an "else", we should skip the entry for "else" in the side table. It was addressed in google#690 during execution and should be moved to validation. google#46 Co-authored-by: Zhou Fang <[email protected]>
Apply `delta_ip` and `delta_stp` from side table in `exec.rs`. google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
On my Linux Docker container: 1. CoreMark results of this PR: 37.991516 (in 18.695s), 36.7287 (in 19.258s) CoreMark results of google#690: 37.101162 (in 19.133s), 37.39716 (in 18.996s) Look very similar. Not sure about `nordic`. 2. CoreMark results of `main`: 28.791477 (in 18.09s), 28.814291 (in 17.734s) So there does seem some minor performance improvement with `delta_ip` and `delta_stp`. google#46 --------- Co-authored-by: Zhou Fang <[email protected]>
google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
It improves the performance by removing the costly `last_frame_values_cnt()`. On linux, `CoreMark result: 39.795715 (in 17.853s)`. cf. CoreMark result was about 37 in 19s in google#705. google#46 --------- Co-authored-by: Zhou Fang <[email protected]>
google#46 --------- Co-authored-by: Zhou Fang <[email protected]>
- Apply `val_cnt` and `pop_cnt`. - Remove `Label` by introducing `Frame::labels_cnt`. There is not much performance improvement on Linux due to this PR, which replaces ``` let values_cnt: usize = frame.labels[i ..].iter().map(|label| label.values_cnt).sum(); ``` in `pop_label()` by `pop_cnt + val_cnt`. google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
Based on experiments, these are the smallest masks to pass CI. In other words, `SideTableEntry` only requires `u35` at this point. I'll add the fields from `func_type()` and `func()` in `module.rs` to the side table in google#711, and `SideTableEntry` as `u64` might not be sufficient. google#46 --------- Co-authored-by: Zhou Fang <[email protected]>
google#46 --------- Co-authored-by: Zhou Fang <[email protected]>
I'd like to get some feedback about the design: the "section table" is used to improve the performance of [these accessors](https://github.com/zhouwfang/wasefire/blob/715bbaa6e2f79e40464e9b85ad9fc506c2d02d57/crates/interpreter/src/module.rs#L176-L209) in `module.rs`, which are used in execution. For these three accessors that return `Parser`, we can precompute the `ParseRange` during validation, and create a parser based on the range during execution. I'm not sure about [the other accessors](https://github.com/zhouwfang/wasefire/blob/715bbaa6e2f79e40464e9b85ad9fc506c2d02d57/crates/interpreter/src/module.rs#L119-L174) that return types or even `Option<ExportDesc>`. IIUC, we want to store the section table in flash as the side table. Does that mean we would have to design some encoding rules for these return types? **Update 1** We will only optimize [`func()`](https://github.com/zhouwfang/wasefire/blob/715bbaa6e2f79e40464e9b85ad9fc506c2d02d57/crates/interpreter/src/module.rs#L188) and [`func_type()`](https://github.com/zhouwfang/wasefire/blob/715bbaa6e2f79e40464e9b85ad9fc506c2d02d57/crates/interpreter/src/module.rs#L119) in `module.rs`, because the other accessors should be fast. **Update 2** - Rename the previous `SideTableEntry` as `BranchTableEntry`. - Create per-function `SideTable` to include branch table and other function metadata. google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
CoreMark result: 37.404152 (in 18.937s) Slight performance improvement on Linux compared to [the previous state](google#46 (comment)). google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
The main changes are: - Do not create a branch table entry for unreachable code. This saves space and permits a uniform treatment of branches. - Use the correct stack length for the Loop branch target (we're taking the branch before being in the label, so the current stack needs to be added). - Take the Else source and target branches at the correct time (i.e. when the stack is the expected one such that the stack length is correctly computed).
This way we can make sure which part of it is unsupported, namely the large table.
google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
google#46 --------- Co-authored-by: Zhou Fang <[email protected]> Co-authored-by: Julien Cretin <[email protected]>
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
cc @zhouwfang I'll try to review this when I get time. I'll directly modify the PR as I review it. |
Gentle ping :) |
We currently have:
Logging this before doing the "cursor" change, to see if the |
Final measurements:
|
HWCI nordic passed |
Thanks! |
Thank you! That was a big piece of work! |
This is part of #46 and was developed in the
dev/fast-interp
branch.