You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
Our tables write checkpoints with statistics written as structs, delta.checkpoint.writeStatsAsStruct = true and delta.checkpoint.writeStatsAsJson = false
After a checkpoint if you call add_actions_table looking at statistics:
What you expected to happen:
I expect add_actions_table to have statistics available regardless of what the latest checkpoint is and how the stats were written to it
How to reproduce it:
Configure table with delta.checkpoint.writeStatsAsStruct = true and delta.checkpoint.writeStatsAsJson = false
Write data
Checkpoint
call add_actions_table
observe no stats are present
More details:
log_data method is probably usable here for add_actions_table since it already has the data in arrow format AND it hydrates stats regardless of how they are represented in checkpoints or not.
It would just need a method on FileStatsAccessor to build a record batch out of its internal columns.
As a workaround I can probably enable json stats in addition to struct stats in checkpoints for little overhead.
Our use case is we make the add_actions_table queryable with datafusion to provide a sql function to explore delta table stats.
The text was updated successfully, but these errors were encountered:
Environment
Delta-rs version:
0.25.5
Binding:
Rust, Python
Environment:
Bug
What happened:
Our tables write checkpoints with statistics written as structs, delta.checkpoint.writeStatsAsStruct = true and delta.checkpoint.writeStatsAsJson = false
After a checkpoint if you call add_actions_table looking at statistics:
stats
onAdd
s vs includingstats_parsed
as well: https://github.com/delta-io/delta-rs/blob/python-v0.25.5/crates/core/src/table/state_arrow.rs#L98stats_parsed
What you expected to happen:
I expect add_actions_table to have statistics available regardless of what the latest checkpoint is and how the stats were written to it
How to reproduce it:
More details:
log_data method is probably usable here for add_actions_table since it already has the data in arrow format AND it hydrates stats regardless of how they are represented in checkpoints or not.
It would just need a method on FileStatsAccessor to build a record batch out of its internal columns.
As a workaround I can probably enable json stats in addition to struct stats in checkpoints for little overhead.
Our use case is we make the add_actions_table queryable with datafusion to provide a sql function to explore delta table stats.
The text was updated successfully, but these errors were encountered: