-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional HLL for distinct value count to StatisticsBuilder #11928
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oerling overall looks good % nits. Thanks!
/// 'disableNumDistinct' is true, distinct values will not be counted. | ||
explicit StatisticsBuilder( | ||
const StatisticsBuilderOptions& options, | ||
bool disableNumDistinct = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we want to disable num distinct but options set it? And I don't see it has been used. Thanks!
} | ||
|
||
int64_t cardinality() { | ||
VELOX_CHECK(hll_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VELOX_CHECK_NOTNULL
} | ||
} | ||
|
||
int64_t cardinality() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const method?
@@ -93,6 +93,13 @@ void StatisticsBuilder::merge( | |||
// Merge size | |||
mergeCount(size_, other.getSize()); | |||
} | |||
if (hll_) { | |||
if (auto* otherBuilder = dynamic_cast<const StatisticsBuilder*>(&other)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add VELOX NOT NULL check here? We should expect all other statistic builder has the same type?
@@ -93,6 +93,13 @@ void StatisticsBuilder::merge( | |||
// Merge size | |||
mergeCount(size_, other.getSize()); | |||
} | |||
if (hll_) { | |||
if (auto* otherBuilder = dynamic_cast<const StatisticsBuilder*>(&other)) { | |||
if (otherBuilder->hll_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly shall we check the other build has set hll?
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
*/ | ||
class StatisticsBuilder : public virtual dwio::common::ColumnStatistics { | ||
public: | ||
/// Constructs with 'options'. If 'options' enable count distinct and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the comment here.
@@ -409,7 +444,7 @@ class MapStatisticsBuilder : public StatisticsBuilder, | |||
MapStatisticsBuilder( | |||
const Type& type, | |||
const StatisticsBuilderOptions& options) | |||
: StatisticsBuilder{options}, | |||
: StatisticsBuilder{options.withoutNumDistinct()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need to disable here?
Adds an optional HLL counter for distinct values to StatisticsBuilder. This is used in Verax sampling to estimate column cardinalities for scalar types.
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…ookincubator#11928) Summary: Adds an optional HLL counter for distinct values to StatisticsBuilder. This is used in Verax sampling to estimate column cardinalities for scalar types. Pull Request resolved: facebookincubator#11928 Reviewed By: xiaoxmeng Differential Revision: D67565289 Pulled By: oerling fbshipit-source-id: e5cbe96059a932377a3d977bca816be6ec52e038
Adds an optional HLL counter for distinct values to StatisticsBuilder. This is used in Verax sampling to estimate column cardinalities for scalar types.