-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate pops while ignoring nameless blocks? #6950
base: main
Are you sure you want to change the base?
Conversation
I think this makes sense. I don't think we've considered something like this. (Also we need to fix the new parser to handle this too, because currently it rejects parsing if a But I wonder why was this a problem before, for example, in #4237..? The example in #4237 also an unnamed block, but apparently it caused a validation error in the engine. |
Good catch, the fuzzer found that bug overnight 😄 About the testcase in #4237, looks like it's an unnamed block so we should accept it as valid (rather than fix it up), if we go with this approach. But I'm not sure if I understood your question? |
#4237 seems to say the binary was rejected at compile time from the engine. Does that mean we didn't omit the unnamed blocks in binary then? If we did why would the engine have rejected it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks! I think there are still cases we need fixups but this will remove maybe more than 90% or more of them.
@tlively I could use some advice on how to fix the parser here. This PR requires it to accept code like this: In the last commit I believe I fixed up
IIUC, the pop is in the catch, not in the block. I suspect that is because of binaryen/src/wasm/wasm-ir-builder.cpp Lines 868 to 872 in b381a8c
Does that code need to be changed to put the pop in the right place? I'm not sure how though. |
if (!Type::isSubType(expectedType, type)) { | ||
return Err{std::string("Expected pop of type ") + expectedType.toString()}; | ||
} | ||
return Ok{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change avoids the error, but it doesn't actually move the pop into the right place. Whenever we try to pop from an empty stack, we should look up the stack through nameless blocks to find pops that we could use.
Weird edge case to test:
- We lazily give unnamed blocks names when we encounter branches that target them via an index. What happens if we have previously tried to pull a
pop
through such a block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a situation of popping from an empty stack? Are Pop
instructions not added to the stack like anything else? (they have a concrete type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, pop
s are special. They are automatically pushed onto the stack when the beginning of a catch
is parsed, not when the pop
instruction itself is parsed. That's why the existing code here just does some error checking and doesn't push anything. The reason is that we fully support parsing standard code that does not contain any pop
s, so we can't wait until we see a pop
to push it onto the stack. For example, this parses:
try
catch $e
drop
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the edge case here, we should use a scratch local to pull the value of the pop into the block. That won't help the example I gave below, though, so we still need to relax the validation behavior as well.
(catch $tag | ||
(drop | ||
(block (result i32) | ||
(pop i32) | ||
) | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Wasm doesn't really make sense from a typing perspective unless the block takes an i32 parameter, which we don't yet support. I'm not sure we should want to be able to parse something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, we probably should not be printing text that looks like this.
I'm working on using IRBuilder in the binary parser now, after which we will be able to update it to support control flow parameters, so we could fix this example to have proper typing, but even then, IRBuilder will lower the control flow parameters away using locals, so we still wouldn't be able to parse this IR directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see what you mean. But this is consistent with our other extensions to the text format, e.g., this validates:
(module
(func $foo
(local $x (ref func))
(block
(local.set $x
(ref.func $foo)
)
)
(drop
(local.get $x)
)
)
)
That isn't valid wat since the block should end the range of the non-nullable local, but we decided to allow it.
With that said, I do see that this change would be more intrusive and confusion, potentially. So maybe this PR is a bad idea.
But we do have a general problem here, where more passes then we expected need EH fixups. Modifying them all adds clutter and overhead, so I was hoping for a more general solution...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Wasm doesn't really make sense from a typing perspective unless the block takes an i32 parameter, which we don't yet support. I'm not sure we should want to be able to parse something like this.
I don't think I understand. Why does this not make sense unless the block takes an i32 input parameter? What does this have to do with an input parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pop i32
models popping a value off the stack and then pushing it right back onto the stack to be consumed by its parent expression, so it's similar to a nop
, except that it requires a value to be on the stack in the first place. In this example, there would only be a value available on the stack if the block
took an i32
parameter.
In the long run, I hope that the text and binary parsers will be able to lower legacy EH directly to new EH IR, so all the problems with pop validation will become moot. Maybe there are less intrusive changes we can make in the short term?
Perhaps another option instead of this PR is, since this is in old EH, we can ignore the "pop must be in a catch" parser error in the fuzzer..? |
Hmm, I just ran into a situation where IRBuilder already generates a pop in a block.
Generates this IR:
I found this due to test/lit/binary/stack-eh-legacy.test when updating the binary parser to use IRBuilder. The easiest path forward is probably to figure out how to land this PR without causing more problems. The two solutions I see are to do the thing where we reach up the stack and move the pop when parsing a pop in a block or to print nameless blocks as comments so text round tripping newly valid IR does not require the parser to parse a pop in a block. |
IRBuilder is a utility for turning arbitrary valid streams of Wasm instructions into valid Binaryen IR. It is already used in the text parser, so now use it in the binary parser as well. Since the IRBuilder API for building each intruction requires only the information that the binary and text formats include as immediates to that instruction, the parser is now much simpler than before. In particular, it does not need to manage a stack of instructions to figure out what the children of each expression should be; IRBuilder handles this instead. There are some differences between the IR constructed by IRBuilder and the IR the binary parser constructed before this change. Most importantly, IRBuilder generates better multivalue code because it avoids eagerly breaking up multivalue results into individual components that might need to be immediately reassembled into a tuple. It also parses try-delegate more correctly, allowing the delegate to target arbitrary labels, not just other `try`s. There are also a couple superficial differences in the generated label and scratch local names. There are two remaining bugs: First, support for creating DWARF location spans is missing because IRBuilder does not have an API for that yet (but source map locations work fine). Second, IRBuilder generates pops inside nameless blocks in some circumstances involving stacky code. This is currently an IR validation error, so #6950 will have to be resolved before this can land. This change also makes the binary parser significantly slower (by about 50%). The lowest hanging performance fruit seems to be tracking branch targets in IRBuilder to avoid having to scan for branches when finalizing blocks.
@tlively Oh, I was thinking we didn't want to land this? Are you saying it would help the IRBuilder work? For that code sample, running the EH fixups helper should fix things? |
It would fix validation errors that you can get out of IRBuilder today, yes.
Yes, but we probably don't want to do that every time we parse a module, right? |
Yeah, it's the same issue as with passes. We prefer not to run the EH fixups, so we try to run them only when we detect they might be needed, specifically when we add a block around a pop. Some passes have good ways to detect that. If IRBuilder can do so then that might be simpler than this PR, which changes what we accept as valid input. |
Ok, I can take a look at that. |
@aheejin The fuzzer found another issue on pops in blocks (see
test/lit/passes/gto-removals.wast
in this PR), so I started to think that we might want a more general solution here. This draft PR changes validation so that nameless blocks, which vanish in the binary format, are ignored for validation. SomakeSequence()
etc. will never break pop validation.This seems to work on the test suite. Do you remember if we considered this before? Is there a problem I am missing?