Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(array): Add Presto function array_top_n #12105

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

peterenescu
Copy link
Contributor

Summary:
Adds Presto function array_top_n as a simple function in Velox. Function uses a temporary vector to store inputted values and heap sorts them up to k values (second input to function).

Updates ArrayFunction.h with struct ArrayTopNFunction and adds new tester function ArrayTopNTest.cpp

Differential Revision: D68031372

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68031372

Copy link

netlify bot commented Jan 17, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 5930e97
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67a3f92b988f620008c65e0d

});

auto result = evaluate<ArrayVector>(
"if(c0 % 2 = 0, array_top_n(c1, 2), array_top_n(c2, 2))",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the second param be a column? if yes, can you add a test for that ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the best way to confirm that?

velox/functions/prestosql/ArrayFunctions.h Outdated Show resolved Hide resolved

// Heap sort k values.
std::partial_sort(
heap.begin(), heap.begin() + n, heap.end(), std::greater<>{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need special handling for floating points (use a different comparator), see #9772 for reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like tests pass for floating points, can you point to a test case I can adopt to confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom comparator used in #9772 correctly handles NaN values, please add test cases using NaNs. They are considered greater than infinity and equal to one another. See test cases in #9772 to get an idea

peterenescu added a commit to peterenescu/velox that referenced this pull request Jan 22, 2025
Summary:

Adds Presto function array_top_n as a simple function in Velox. Function uses a temporary vector to store inputted values and heap sorts them up to k values (second input to function).

Updates ArrayFunction.h with struct ArrayTopNFunction and adds new tester function ArrayTopNTest.cpp

Differential Revision: D68031372
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68031372

peterenescu added a commit to peterenescu/velox that referenced this pull request Jan 24, 2025
Summary:

Adds Presto function array_top_n as a simple function in Velox. Function uses a temporary vector to store inputted values and heap sorts them up to k values (second input to function).

Updates ArrayFunction.h with struct ArrayTopNFunction and adds new tester function ArrayTopNTest.cpp

Differential Revision: D68031372
peterenescu added a commit to peterenescu/velox that referenced this pull request Jan 28, 2025
Summary:

Adds Presto function array_top_n as a simple function in Velox. Function uses a temporary vector to store inputted values and heap sorts them up to k values (second input to function).

Updates ArrayFunction.h with struct ArrayTopNFunction and adds new tester function ArrayTopNTest.cpp

Differential Revision: D68031372
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68031372

peterenescu added a commit to peterenescu/velox that referenced this pull request Jan 28, 2025
Summary:

Adds Presto function array_top_n as a simple function in Velox. Function uses a temporary vector to store inputted values and heap sorts them up to k values (second input to function).

Updates ArrayFunction.h with struct ArrayTopNFunction and adds new tester function ArrayTopNTest.cpp

Differential Revision: D68031372
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68031372

peterenescu added a commit to peterenescu/velox that referenced this pull request Jan 29, 2025
Summary:

Adds Presto function array_top_n as a simple function in Velox. Function uses a temporary vector to store inputted values and heap sorts them up to k values (second input to function).

Updates ArrayFunction.h with struct ArrayTopNFunction and adds new tester function ArrayTopNTest.cpp

Differential Revision: D68031372
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68031372

peterenescu added a commit to peterenescu/velox that referenced this pull request Jan 29, 2025
Summary:
Pull Request resolved: facebookincubator#12105

Adds Presto function array_top_n as a simple function in Velox. Function uses a temporary vector to store inputted values and heap sorts them up to k values (second input to function).

Updates ArrayFunction.h with struct ArrayTopNFunction and adds new tester function ArrayTopNTest.cpp

Differential Revision: D68031372
int numNull = 0;
for (const auto& item : array) {
if (item.has_value()) {
minHeap.push(item.value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do a check on the value against the top() in the minHeap before pushing it on the heap (both here and in the generic implementation). In the current implementation if you always add it then it would end up doing many more comparisons and switches to balance the heap which can be wasteful

Summary:

Adds Presto function array_top_n as a simple function in Velox. Function uses a temporary vector to store inputted values and heap sorts them up to k values (second input to function).

Updates ArrayFunction.h with struct ArrayTopNFunction and adds new tester function ArrayTopNTest.cpp

Differential Revision: D68031372
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68031372

@peterenescu peterenescu changed the title feat: Add Presto function array_top_n feat(array): Add Presto function array_top_n Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants