Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add #[recursive] #1522

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Add #[recursive] #1522

wants to merge 4 commits into from

Conversation

blaginin
Copy link

@blaginin blaginin commented Nov 14, 2024

Closes #984, related to apache/datafusion#9375 (comment)

Todo:

  • Test the performance overhead on large queries. If it's too high, move it to a separate feature to allow opting out

@blaginin
Copy link
Author

blaginin commented Nov 14, 2024

I was wondering if we should remove RecursionCounter with this PR. In my opinion, we shouldn't, because the ability to limit max recursion might be useful for some users still

@blaginin
Copy link
Author

blaginin commented Nov 15, 2024

It seems this PR doesn't have any significant performance impact.

critcmp main recursive                        
group                                                    main                                   recursive
-----                                                    ----                                   ---------
sqlparser-rs parsing benchmark/format_large_statement    1.00   309.1±12.72µs        ? ?/sec    1.01    312.4±7.45µs        ? ?/sec
sqlparser-rs parsing benchmark/parse_large_statement     1.02      6.3±0.35ms        ? ?/sec    1.00      6.2±0.40ms        ? ?/sec
sqlparser-rs parsing benchmark/parse_sql_large_query     1.00      6.1±0.23ms        ? ?/sec  
sqlparser-rs parsing benchmark/sqlparser::select         1.02  1893.8±82.19ns        ? ?/sec    1.00  1855.1±25.08ns        ? ?/sec
sqlparser-rs parsing benchmark/sqlparser::with_select    1.03     11.7±1.20µs        ? ?/sec    1.00     11.4±0.28µs        ? ?/sec

@blaginin blaginin marked this pull request as ready for review November 15, 2024 19:22
@blaginin
Copy link
Author

FYI, I marked this ready for review. @peter-toth @Eason0729, if you guys want to take a look 🌻

@iffyio
Copy link
Contributor

iffyio commented Nov 16, 2024

@blaginin thanks for looking to fix this!

Currently the preference is to avoid a third-party dependency for this issue, ideally fixing up the parser behavior instead to properly handle deeply nested input. See comment here for a bit more context on rationale
From a quick look at recursive it seems to be a procmacro on top of the stacker library so that the same considerations should apply I imagine.

Copy link
Contributor

@Eason0729 Eason0729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @blaginin. LGTM except for some number in unit test.

I left some comments, and any number above 40000 seems to overflow stack without #[recursive].

@@ -42,6 +42,46 @@ fn basic_queries(c: &mut Criterion) {
group.bench_function("sqlparser::with_select", |b| {
b.iter(|| Parser::parse_sql(&dialect, with_query));
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large_statement, making separated test would make differentiating potential(future) regression easier.
It's a suggestion(fine to leave it as it is).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood your comment, sorry. I added tests for parsing large statements in tests/sqlparser_common.rs. Do you think we should test something else?

#[test]
fn overflow() {
let expr = std::iter::repeat("1")
.take(1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 is reasonable for general usage, but 1000 won't overflow without recursive macro. Maybe we could increase it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on the platform. IMO, the most important thing is that it overflows the stack in the CI


#[test]
fn overflow() {
let cond = (0..1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as sqlparser_common::overflow test. Maybe we could increase it(increase the number 1000).

@peter-toth
Copy link

peter-toth commented Nov 16, 2024

Currently the preference is to avoid a third-party dependency for this issue, ideally fixing up the parser behavior instead to properly handle deeply nested input. See comment here for a bit more context on rationale From a quick look at recursive it seems to be a procmacro on top of the stacker library so that the same considerations should apply I imagine.

@iffyio, we had done some research on stacker / recursive in apache/datafusion#13310 to verify concerns before we added it to datafusion: apache/datafusion#13310 (review) / apache/datafusion#13177 (comment)

@Eason0729
Copy link
Contributor

Currently the preference is to avoid a third-party dependency for this issue, ideally fixing up the parser behavior instead to properly handle deeply nested input. See comment here for a bit more context on rationale From a quick look at recursive it seems to be a procmacro on top of the stacker library so that the same considerations should apply I imagine.

@iffyio, we had done some research on stacker / recursive in apache/datafusion#13310 to verify concerns before we added it to datafusion: apache/datafusion#13310 (review) / apache/datafusion#13177 (comment)

Sorry for missing that context when reviewing. So we would like to avoid using recursive for stablility, maybe try...

  1. vender the code from recursive
  2. use stacker directly, like fix: add stacker and maybe_grow on recursion guard #1468

@blaginin
Copy link
Author

Thanks for the review!! 🙂

Sorry for missing that context when reviewing. So we would like to avoid using recursive for stablility, maybe try...

We ended up using #[recursive] in Datafusion, as Peter highlighted. Feels like it’s good to stay consistent across projects?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stack overflow in Statement::to_string for deeply nested expresions
4 participants