Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RowContainer's columns' HasNulls info #11741

Open
zsmj2017 opened this issue Dec 4, 2024 · 3 comments
Open

RowContainer's columns' HasNulls info #11741

zsmj2017 opened this issue Dec 4, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@zsmj2017
Copy link

zsmj2017 commented Dec 4, 2024

Description

#11558
and
#10775

Both of these PRs add column-based statistics to RowContainer (the former includes column length information and nullcnt, while the latter only includes nullcnt).

Can we consider merging these two statistics?
From the current perspective, the changes in PR 10775 can be fully covered by PR 11558 (But PR 10775 is always enabled, while PR 11558 provides an option to decide whether to enable it).

I simply think that the statistical information should come from the same data source as much as possible. Dispersed data sources may lead to certain changes missing part of the logic, which could result in APIs that are supposed to have consistent semantics failing to return the same results.

@zsmj2017 zsmj2017 added the enhancement New feature or request label Dec 4, 2024
@zsmj2017 zsmj2017 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 4, 2024
@xiaoxmeng
Copy link
Contributor

@zsmj2017 good point. We can consider to support this. cc @tanjialiang @zation99

@xiaoxmeng xiaoxmeng reopened this Dec 4, 2024
@zation99 zation99 self-assigned this Dec 4, 2024
@zation99
Copy link
Contributor

zation99 commented Dec 10, 2024

@xiaoxmeng @tanjialiang one decision we need to make is, after we merge these two places to add the stats, do we want to enable it by default as what #10775 does, or make it disabled by default as what #11558 does?
Another choice is that we only make this stats nullcnt enabled by default but others disabled by default. Not sure if the effort worths it though.
The problem is if we keep all of the stats disabled as in #11558 after the merge, there will be performance regression in #10775 cases.
What do you think?

@zation99
Copy link
Contributor

nvm @tanjialiang pointed me to the recent PR that enabled col stats by default: #11731

zation99 added a commit to zation99/velox that referenced this issue Dec 14, 2024
Summary:
PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 14, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 14, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 14, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 15, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 15, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 15, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 15, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 15, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 17, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 18, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 18, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 18, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 18, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 18, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 18, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 19, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 19, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 19, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Reviewed By: xiaoxmeng

Differential Revision: D67229925
zation99 added a commit to zation99/velox that referenced this issue Dec 19, 2024
…ator#11860)

Summary:

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Reviewed By: xiaoxmeng

Differential Revision: D67229925
facebook-github-bot pushed a commit that referenced this issue Dec 19, 2024
Summary:
Pull Request resolved: #11860

PR to address #11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Reviewed By: xiaoxmeng

Differential Revision: D67229925

fbshipit-source-id: ef66a1a419942f0b1d699749849c19e05be378e7
athmaja-n pushed a commit to athmaja-n/velox that referenced this issue Jan 10, 2025
…ator#11860)

Summary:
Pull Request resolved: facebookincubator#11860

PR to address facebookincubator#11741

- Removed the use of columnHasNulls in RowContainer and replaced them with row stats
- Separate null count/sumBytes from minmax. In the case of rows erasure, only min/max is invalidated.

Reviewed By: xiaoxmeng

Differential Revision: D67229925

fbshipit-source-id: ef66a1a419942f0b1d699749849c19e05be378e7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants