-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Remove redundant statistics from FileScanConfig #14955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f3eb12a
to
c819b16
Compare
Looks like there are some CI issues to address Note that @blaginin fixed some issues recently, so if you merge up from main it might be better now |
c819b16
to
bdfb9ac
Compare
i feel it may be easier if we fix #14936 first. I was planning to do it this week, but feel free to take over (just take the issue then) |
bdfb9ac
to
6743003
Compare
hey @Standing-Man #15352 just got merged so this pr may be easier to finish fyi 🌻 |
Signed-off-by: Alan Tang <[email protected]>
9397785
to
2d6e298
Compare
Signed-off-by: Alan Tang <[email protected]> chore: fix fmt and clippy errors
cbeb95e
to
b99e658
Compare
Hi @blaginin, Thanks for your valuable contributions! I’ll continue working on fixing this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Standing-Man and @blaginin
FYI @xudong963 perhaps you can review this too as you are also working on statistics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the statistics
generated in line 884 will be lost.
I think the idea is that the DataSource has the statistics rather than the FileScanConfig 🤔 |
yeah i know, but i don't see the statistics is set to the datasource. I will check the code again later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, I saw this line: https://github.com/apache/datafusion/pull/14955/files#diff-6ddd45bba34238ec207080fd56251fffeee8ef263a3ae25be172362bab753a5cL401, so the statistics isn't lost.
Thank you for your contribution!
Thanks again @Standing-Man and @xudong963 |
* enhance: Remove redundant statistics from FileScanConfig Signed-off-by: Alan Tang <[email protected]> * chore: fix some ci error Signed-off-by: Alan Tang <[email protected]> chore: fix fmt and clippy errors --------- Signed-off-by: Alan Tang <[email protected]>
Which issue does this PR close?
Rationale for this change
Both
FileScanConfig
andDataSource
has same statistics, it make that statistics may have out-of-sync bugs.What changes are included in this PR?
Remove redundant statistics from
FileScanConfig
and only a single statistics that held on theDataSource
.Are these changes tested?
Are there any user-facing changes?