Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support skip/limit options for pandas scan #4662

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

royi-luo
Copy link
Collaborator

Description

Add support for skipping rows + limiting number of rows to scan when scanning from pandas dataframes.
Also reuse PyarrowScanConfig for pandas scan config and rename to more appropriate name (since it isn't pyarrow only)

Contributor agreement

@royi-luo royi-luo self-assigned this Dec 20, 2024
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 62.50000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 86.50%. Comparing base (970f5c4) to head (8f4d024).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/function/table/bind_data.cpp 60.00% 2 Missing ⚠️
...rc/include/function/table/simple_table_functions.h 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4662      +/-   ##
==========================================
- Coverage   86.50%   86.50%   -0.01%     
==========================================
  Files        1369     1372       +3     
  Lines       57955    57999      +44     
  Branches     7203     7209       +6     
==========================================
+ Hits        50136    50174      +38     
- Misses       7652     7657       +5     
- Partials      167      168       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Benchmark Result

Master commit hash: 86f1f4310edbd877f8db75800c6d1ac04d1e058d
Branch commit hash: 742d29ea897eb455fdf9635ba4a6462b7aa72d87

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 672.66 647.17 25.49 (3.94%)
aggregation q28 11565.36 11603.98 -38.62 (-0.33%)
filter q14 127.07 125.41 1.66 (1.32%)
filter q15 124.60 126.68 -2.08 (-1.64%)
filter q16 315.35 298.79 16.56 (5.54%)
filter q17 446.92 447.59 -0.67 (-0.15%)
filter q18 1898.34 1944.30 -45.96 (-2.36%)
filter zonemap-node 88.81 87.27 1.54 (1.77%)
filter zonemap-node-lhs-cast 88.21 87.44 0.78 (0.89%)
filter zonemap-node-null 84.34 N/A N/A
filter zonemap-rel 5687.44 5709.27 -21.83 (-0.38%)
fixed_size_expr_evaluator q07 571.07 581.07 -10.01 (-1.72%)
fixed_size_expr_evaluator q08 798.75 809.69 -10.94 (-1.35%)
fixed_size_expr_evaluator q09 802.63 810.57 -7.94 (-0.98%)
fixed_size_expr_evaluator q10 238.62 244.13 -5.51 (-2.26%)
fixed_size_expr_evaluator q11 230.56 235.72 -5.17 (-2.19%)
fixed_size_expr_evaluator q12 225.61 237.23 -11.62 (-4.90%)
fixed_size_expr_evaluator q13 1459.21 1456.84 2.37 (0.16%)
fixed_size_seq_scan q23 110.12 116.84 -6.72 (-5.75%)
join q29 584.25 637.69 -53.44 (-8.38%)
join q30 1568.08 1560.85 7.23 (0.46%)
join q31 6.12 5.07 1.05 (20.69%)
join SelectiveTwoHopJoin 52.90 56.33 -3.42 (-6.07%)
ldbc_snb_ic q35 2627.34 2642.46 -15.12 (-0.57%)
ldbc_snb_ic q36 538.30 537.18 1.12 (0.21%)
ldbc_snb_is q32 5.78 5.16 0.63 (12.12%)
ldbc_snb_is q33 14.87 15.62 -0.74 (-4.77%)
ldbc_snb_is q34 1.27 1.08 0.19 (17.94%)
multi-rel multi-rel-large-scan 1182.83 1234.16 -51.33 (-4.16%)
multi-rel multi-rel-lookup 31.99 21.75 10.24 (47.10%)
multi-rel multi-rel-small-scan 66.81 104.18 -37.37 (-35.87%)
order_by q25 128.39 131.89 -3.50 (-2.65%)
order_by q26 449.79 449.61 0.18 (0.04%)
order_by q27 1454.95 1474.94 -19.98 (-1.35%)
recursive_join recursive-join-bidirection 293.33 298.76 -5.43 (-1.82%)
recursive_join recursive-join-dense 7386.10 7364.18 21.92 (0.30%)
recursive_join recursive-join-path 23637.97 23804.54 -166.57 (-0.70%)
recursive_join recursive-join-sparse 14513.71 14748.68 -234.97 (-1.59%)
recursive_join recursive-join-trail 7292.17 7293.80 -1.64 (-0.02%)
scan_after_filter q01 170.71 169.83 0.88 (0.52%)
scan_after_filter q02 156.89 167.27 -10.38 (-6.21%)
shortest_path_ldbc100 q37 87.73 95.98 -8.25 (-8.60%)
shortest_path_ldbc100 q38 364.45 348.01 16.45 (4.73%)
shortest_path_ldbc100 q39 61.37 64.77 -3.40 (-5.25%)
shortest_path_ldbc100 q40 438.08 421.22 16.87 (4.00%)
var_size_expr_evaluator q03 2065.91 2078.72 -12.82 (-0.62%)
var_size_expr_evaluator q04 2199.12 2232.03 -32.91 (-1.47%)
var_size_expr_evaluator q05 2684.76 2651.46 33.30 (1.26%)
var_size_expr_evaluator q06 1342.34 1332.11 10.23 (0.77%)
var_size_seq_scan q19 1450.45 1457.11 -6.66 (-0.46%)
var_size_seq_scan q20 2693.46 2716.34 -22.88 (-0.84%)
var_size_seq_scan q21 2287.12 2292.11 -4.99 (-0.22%)
var_size_seq_scan q22 128.71 129.58 -0.87 (-0.67%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant