Description
Describe the bug
Maybe related to #5535, but I couldn't find anything identical, so created a fresh issue.
If this is a known bug and you think the fix might be moderate in scope, I'm happy to have a go at fixing it?
To Reproduce
I have a custom TableProvider
and ExecutionPlan
, where calling execute
is somewhat expensive and I want to avoid calling it if no data will match.
The execution plan can return helpful statistics from .statistics()
, including for example, for one column:
...
ColumnStatistics {
null_count: Precision::Exact(0),
max_value: Precision::Exact(ScalarValue::Int64(Some(4))),
min_value: Precision::Exact(ScalarValue::Int64(Some(4))),
distinct_count: Precision::Exact(1),
},
E.g. "in this column all values are equal to 4". This is successfully used by Datafusion if I query value is null
, the execute()
function is never alled.
But if I query value > 5
or value < 0
, the statistic is ignored and execute()
is still called.
Expected behavior
min_value
and max_value
of ColumnStatistics
should be used for pruning and the query plan should not require the "slow" execute method to be called.
Additional context
I can give a fairly minimal example if required, but I thought best to report the issue and check if it was well known before going to that effort?
I've tried this on both main
(as of today) and 37.1.0