-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large OR
list overflows the stack
#9375
Comments
take |
FYI @Lordworms I think this will be a pretty challenging task Also, @peter-toth has an outstanding substantial change to TreeNode APIs here: #8891 which you should be aware of. |
Yes, My basic plan for this is to change recursion in treenode into iteration, I will test my implementation first and read this PR #8891 |
start to do this one, the current plan is to rewrite the infer_placeholder_types function to iteration. |
@Lordworms Are you still working on this? |
Sorry I forget about this one, you can go for it. |
@alamb I'm thinking of introducing #[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct CommutativeExpr {
/// Expressions
pub exprs: Vec<Expr>,
/// The operator that is commutative (order-insensitive), like OR, AND, StringConcat, BitWise Op
pub op: Operator,
}
|
I think that sounds like a good idea to me One thing that would be nice would be to somehow avoid having two potential representions for the same expression So like But that is starting to sound like a large change 🤔 |
we could restrict the minimum elements of CommutativeExpr to 3. |
That could make sense My biggest concern with this proposal is its potential impact on backwards compatibility / causing API churn to solve a very narrow usecase I wonder if you have considered the approach to turning a stack overflow into an error? So like maybe add a configuration flag like "max_expression_depth = 10" or something and then if that depth is exceeded in That would protect against crashes/ stack overflows but still allow people who wanted more complex expressions (and were willing to raise their stack sizes) to run |
I'm able to run the large or list without stack overflow. As far as I can tell. I think we don't need to worry about the
I think we can deal with simple case for the query that has the same operator like large OR list, large AND list, but we can't deal with mixing case like If complete large OR / AND list query for the same operator is worth to add a new Expr, I think we could go for it. |
I recommend we start with the error (to avoid a stack overflow) and then if someone comes with a usecase when they need a super deep I think one common case of large |
It seems to support configurable max recursive depth involved huge breaking change for tree traversal Given stack size is not always the same, I'm not sure constant max recursive depth is a good idea too. |
Here is how the recursion guard is implemented in sqlparser: https://github.com/sqlparser-rs/sqlparser-rs/blob/f9ab8dcc27fd2d55030b9c5fa71e41d5c08dd601/src/parser/mod.rs#L67-L127 And then at the start of each major statement, this gets called: So I don't think we would have to change the |
I came across |
I evaluated stacker as part of sqlparser and I thought it was doing some too crazy stuff that made it hard to use in embedded / wasm environments. Maybe that is better now |
So I got curious and wanted to test #13310 on this example too. It doesn't seem to resolve it, though now instead of a stack overflow I see a |
Bus error is similar to stack overflow in this case. The stacktrace looks like the following with the blowout2 example on latest
So it seems this stack overflow comes from |
Actually, the release build has a different issue. Let me check... |
This is the stacktrace with a release build. Looks like we forgot to annotate
|
Yeah, works if I add |
(alternatively, and more simply, we could convert left- or right-deep OR tree to a balanced one, since it doesn't change semantics) |
Those are nice optimizations, and indeed we should do them, but |
i am not saying we shouldn't do this |
Thank you all for working on this |
BTW the fact that we needed to make get_type recursive that we can spend quite some time recalculating types. If an optimizer code calls expr.get_type, this can easily get quadratic on left-deep or right-deep rexpressions like the one in this issue. |
Describe the bug
In InfluxDB we saw people issue queries with many
OR
chains that caused a stack overflowTo Reproduce
blowout2.zip
Download: blowout.zip
And run
This results in
The query looks like this
Expected behavior
a runtime error rather than stack overflow. Bonus points if the query actually completed
Additional context
Here is the stack trace in a release build:
The text was updated successfully, but these errors were encountered: