Open
Description
Is your feature request related to a problem or challenge?
Query below
SELECT a, ARRAY_AGG(c ORDER BY c)[1] as result
FROM multiple_ordered_table
GROUP BY a;
and
SELECT a, NTH_VALUE(c, 1 ORDER BY c) as result
FROM multiple_ordered_table
GROUP BY a;
produces same results. However, first query generates following plan
"ProjectionExec: expr=[a@0 as a, (ARRAY_AGG(multiple_ordered_table.c) ORDER BY [multiple_ordered_table.c ASC NULLS LAST]@1).[1] as result]",
" AggregateExec: mode=Single, gby=[a@0 as a], aggr=[ARRAY_AGG(multiple_ordered_table.c)], ordering_mode=Sorted",
" CsvExec: file_groups={1 group: [[CSV_PATH]]}, projection=[a, c], output_orderings=[[a@0 ASC NULLS LAST], [c@1 ASC NULLS LAST]], has_header=true",
whereas second query generates following plan
"ProjectionExec: expr=[a@0 as a, NTH_VALUE(multiple_ordered_table.c,Int64(1)) ORDER BY [multiple_ordered_table.c ASC NULLS LAST]@1 as result]",
" AggregateExec: mode=Single, gby=[a@0 as a], aggr=[NTH_VALUE(multiple_ordered_table.c,Int64(1))], ordering_mode=Sorted",
" CsvExec: file_groups={1 group: [[CSV_PATH]]}, projection=[a, c], output_orderings=[[a@0 ASC NULLS LAST], [c@1 ASC NULLS LAST]], has_header=true",
Describe the solution you'd like
we can rewrite first query as second one, which executes faster with less memory. Because it no longer needs to keep all results in the array_agg.
Describe alternatives you've considered
No response
Additional context
No response